You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Prashant Ladha <pr...@gmail.com> on 2012/12/02 23:31:07 UTC

Local Trunk Build - java.io.IOException: Job failed!

Hi,
Earlier, since I was on Windows7 and seeing some exception that nobody saw
so I moved to Ubuntu.
But here, I am seeing the below error message:
Can you help in finding out what I could be doing wrong?


SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/prashant/.ivy2/cache/org.slf4j/slf4j-log4j12/jars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/prashant/workspaceNutchTrunk/trunk/build/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
solrUrl is not set, indexing will be skipped...
crawl started in: crawl
rootUrlDir = urls
threads = 10
depth = 3
solrUrl=null
topN = 50
Injector: starting at 2012-12-02 17:25:13
Injector: crawlDb: crawl/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: total number of urls rejected by filters: 0
Injector: total number of urls injected after normalization and filtering: 0
Injector: Merging injected urls into crawl db.
Injector: finished at 2012-12-02 17:25:28, elapsed: 00:00:14
Generator: starting at 2012-12-02 17:25:28
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: true
Generator: normalizing: true
Generator: topN: 50
Generator: jobtracker is 'local', generating exactly one partition.
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
at org.apache.nutch.crawl.Generator.generate(Generator.java:551)
at org.apache.nutch.crawl.Generator.generate(Generator.java:456)
at org.apache.nutch.crawl.Crawl.run(Crawl.java:130)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)

Re: hung threads in big nutch crawl process

Posted by Eyeris Rodriguez Rueda <er...@uci.cu>.

Thanks markus.
I will try step by step.

----- Mensaje original -----
De: "Markus Jelsma" <ma...@openindex.io>
Para: user@nutch.apache.org
Enviados: Lunes, 3 de Diciembre 2012 15:20:50
Asunto: RE: hung threads in big nutch crawl process

This page explains the individual steps:
http://wiki.apache.org/nutch/NutchTutorial#A3.2_Using_Individual_Commands_for_Whole-Web_Crawling
 
 
-----Original message-----
> From:Eyeris Rodriguez Rueda <er...@uci.cu>
> Sent: Mon 03-Dec-2012 21:08
> To: user@nutch.apache.org
> Subject: RE: hung threads in big nutch crawl process
> 
> Thank markus for your anwer.
> I always have used nutch with console making a complete cycle
> bin/nutch crawl urls -dir crawl -depth 10 -topN 100000 -solr http://localhost:8080/solr
> Could you explain me how to use a separately process. I was reading the wiki but not function for me because I don’t understand the commands. I want to use nutch in distribuited mode, could you give me a good documentation of it.
> 
> _____________________________________________________________________
> Ing. Eyeris Rodriguez Rueda
> Teléfono:837-3370
> Universidad de las Ciencias Informáticas
> _____________________________________________________________________
> 
> -----Mensaje original-----
> De: Markus Jelsma [mailto:markus.jelsma@openindex.io] 
> Enviado el: lunes, 03 de diciembre de 2012 1:42 PM
> Para: user@nutch.apache.org
> Asunto: RE: hung threads in big nutch crawl process
> 
> Hi - Hadoop organizes some threads but in Nutch the only job that uses threads is the fetcher. Parses are done using the executor service.
> 
> It is very well possible that you have some regexes that are very complex and Nutch can take a long time processing those, especially if you parse in the fetcher job.
> 
> You should run the Nutch jobs separate to find out which job is giving you trouble.
> 
> -----Original message-----
> > From:Eyeris Rodriguez Rueda <er...@uci.cu>
> > Sent: Mon 03-Dec-2012 20:31
> > To: user@nutch.apache.org
> > Subject: hung threads in big nutch crawl process
> > 
> > Hi all.
> > I have detected that in big nutch crawl process(depth:10 topN:100 000) some threads are hunged in some part of crawl cicle for example normalizing by regex and fetching urls to.
> > Im using nutch 1.5.1 and solr 3.6.
> > Ram:2GB
> > CPU:CoreI3.
> > OS:Ubuntu 12.04(server)
> > 
> > I have a doubt, How nutch manipulate the threads in a cicle of crawl process ?.
> > Is multithread the generation,fetching,parsing process ? 
> > 
> > PD:Sorry for my english. Is not my native language.

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

RE: hung threads in big nutch crawl process

Posted by Markus Jelsma <ma...@openindex.io>.

This page explains the individual steps:
http://wiki.apache.org/nutch/NutchTutorial#A3.2_Using_Individual_Commands_for_Whole-Web_Crawling
 
 
-----Original message-----
> From:Eyeris Rodriguez Rueda <er...@uci.cu>
> Sent: Mon 03-Dec-2012 21:08
> To: user@nutch.apache.org
> Subject: RE: hung threads in big nutch crawl process
> 
> Thank markus for your anwer.
> I always have used nutch with console making a complete cycle
> bin/nutch crawl urls -dir crawl -depth 10 -topN 100000 -solr http://localhost:8080/solr
> Could you explain me how to use a separately process. I was reading the wiki but not function for me because I don’t understand the commands. I want to use nutch in distribuited mode, could you give me a good documentation of it.
> 
> _____________________________________________________________________
> Ing. Eyeris Rodriguez Rueda
> Teléfono:837-3370
> Universidad de las Ciencias Informáticas
> _____________________________________________________________________
> 
> -----Mensaje original-----
> De: Markus Jelsma [mailto:markus.jelsma@openindex.io] 
> Enviado el: lunes, 03 de diciembre de 2012 1:42 PM
> Para: user@nutch.apache.org
> Asunto: RE: hung threads in big nutch crawl process
> 
> Hi - Hadoop organizes some threads but in Nutch the only job that uses threads is the fetcher. Parses are done using the executor service.
> 
> It is very well possible that you have some regexes that are very complex and Nutch can take a long time processing those, especially if you parse in the fetcher job.
> 
> You should run the Nutch jobs separate to find out which job is giving you trouble.
> 
> -----Original message-----
> > From:Eyeris Rodriguez Rueda <er...@uci.cu>
> > Sent: Mon 03-Dec-2012 20:31
> > To: user@nutch.apache.org
> > Subject: hung threads in big nutch crawl process
> > 
> > Hi all.
> > I have detected that in big nutch crawl process(depth:10 topN:100 000) some threads are hunged in some part of crawl cicle for example normalizing by regex and fetching urls to.
> > Im using nutch 1.5.1 and solr 3.6.
> > Ram:2GB
> > CPU:CoreI3.
> > OS:Ubuntu 12.04(server)
> > 
> > I have a doubt, How nutch manipulate the threads in a cicle of crawl process ?.
> > Is multithread the generation,fetching,parsing process ? 
> > 
> > PD:Sorry for my english. Is not my native language.
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
>

RE: hung threads in big nutch crawl process

Posted by Eyeris Rodriguez Rueda <er...@uci.cu>.

Thank markus for your anwer.
I always have used nutch with console making a complete cycle
bin/nutch crawl urls -dir crawl -depth 10 -topN 100000 -solr http://localhost:8080/solr
Could you explain me how to use a separately process. I was reading the wiki but not function for me because I don’t understand the commands. I want to use nutch in distribuited mode, could you give me a good documentation of it.

_____________________________________________________________________
Ing. Eyeris Rodriguez Rueda
Teléfono:837-3370
Universidad de las Ciencias Informáticas
_____________________________________________________________________

-----Mensaje original-----
De: Markus Jelsma [mailto:markus.jelsma@openindex.io] 
Enviado el: lunes, 03 de diciembre de 2012 1:42 PM
Para: user@nutch.apache.org
Asunto: RE: hung threads in big nutch crawl process

Hi - Hadoop organizes some threads but in Nutch the only job that uses threads is the fetcher. Parses are done using the executor service.

It is very well possible that you have some regexes that are very complex and Nutch can take a long time processing those, especially if you parse in the fetcher job.

You should run the Nutch jobs separate to find out which job is giving you trouble.

-----Original message-----
> From:Eyeris Rodriguez Rueda <er...@uci.cu>
> Sent: Mon 03-Dec-2012 20:31
> To: user@nutch.apache.org
> Subject: hung threads in big nutch crawl process
> 
> Hi all.
> I have detected that in big nutch crawl process(depth:10 topN:100 000) some threads are hunged in some part of crawl cicle for example normalizing by regex and fetching urls to.
> Im using nutch 1.5.1 and solr 3.6.
> Ram:2GB
> CPU:CoreI3.
> OS:Ubuntu 12.04(server)
> 
> I have a doubt, How nutch manipulate the threads in a cicle of crawl process ?.
> Is multithread the generation,fetching,parsing process ? 
> 
> PD:Sorry for my english. Is not my native language.


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

RE: hung threads in big nutch crawl process

Posted by Markus Jelsma <ma...@openindex.io>.

Hi - Hadoop organizes some threads but in Nutch the only job that uses threads is the fetcher. Parses are done using the executor service.

It is very well possible that you have some regexes that are very complex and Nutch can take a long time processing those, especially if you parse in the fetcher job.

You should run the Nutch jobs separate to find out which job is giving you trouble.

-----Original message-----
> From:Eyeris Rodriguez Rueda <er...@uci.cu>
> Sent: Mon 03-Dec-2012 20:31
> To: user@nutch.apache.org
> Subject: hung threads in big nutch crawl process
> 
> Hi all.
> I have detected that in big nutch crawl process(depth:10 topN:100 000) some threads are hunged in some part of crawl cicle for example normalizing by regex and fetching urls to.
> Im using nutch 1.5.1 and solr 3.6.
> Ram:2GB
> CPU:CoreI3.
> OS:Ubuntu 12.04(server)
> 
> I have a doubt, How nutch manipulate the threads in a cicle of crawl process ?.
> Is multithread the generation,fetching,parsing process ? 
> 
> PD:Sorry for my english. Is not my native language.
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
>

hung threads in big nutch crawl process

Posted by Eyeris Rodriguez Rueda <er...@uci.cu>.

Hi all.
I have detected that in big nutch crawl process(depth:10 topN:100 000) some threads are hunged in some part of crawl cicle for example normalizing by regex and fetching urls to.
Im using nutch 1.5.1 and solr 3.6.
Ram:2GB
CPU:CoreI3.
OS:Ubuntu 12.04(server)

I have a doubt, How nutch manipulate the threads in a cicle of crawl process ?.
Is multithread the generation,fetching,parsing process ? 

PD:Sorry for my english. Is not my native language.

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Local Trunk Build - java.io.IOException: Job failed!

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Hi Prashant,

Do you require wiki write access?

Best

Lewis

On Mon, Dec 3, 2012 at 4:08 AM, Prashant Ladha <pr...@gmail.com> wrote:
> I found a possible solution. I ended up modifying the nutch-default.xml
> file to hard code the plugins.folder path.[0]
> If everyone ends up doing the same thing, then we should add it it in
> installation guide.[1]
>
> [0]
> <property>
>   <name>plugin.folders</name>
>   <!-- <value>plugins</value> -->
> *  <value>/home/prashant/workspaceNutchTrunk/trunk/build/plugins</value>*
>   <description>Directories where nutch plugins are located.  Each
>   element may be a relative or absolute path.  If absolute, it is used
>   as is.  If relative, it is searched for on the classpath.</description>
> </property>
>
> This was already discussed in Nutch JIRA portal.
> https://issues.apache.org/jira/browse/NUTCH-937
>
> [1]
> http://wiki.apache.org/nutch/RunNutchInEclipse
>
>
> On Sun, Dec 2, 2012 at 5:47 PM, Prashant Ladha <pr...@gmail.com>wrote:
>
>> Hi Markus.
>> After sending the email, I again went through the instructions on [0]
>> link.
>> Then I saw the instruction to look at hadoop.log file so looking at the
>> log file, I found [1]
>> Are there any native Hadoop libraries that we have to install?
>>
>> I am on Ubuntu 12.1, JDK 1.7 & Trunk Nutch.
>>
>> [0] http://wiki.apache.org/nutch/RunNutchInEclipse
>> [1] attached hadoop.log
>>
>>
>> On Sun, Dec 2, 2012 at 5:45 PM, Markus Jelsma <ma...@openindex.io>wrote:
>>
>>> hi - Please provide log output and version number.
>>>
>>> -----Original message-----
>>> > From:Prashant Ladha <pr...@gmail.com>
>>> > Sent: Sun 02-Dec-2012 23:37
>>> > To: user@nutch.apache.org
>>> > Subject: Local Trunk Build - java.io.IOException: Job failed!
>>> >
>>> > Hi,
>>> > Earlier, since I was on Windows7 and seeing some exception that nobody
>>> saw
>>> > so I moved to Ubuntu.
>>> > But here, I am seeing the below error message:
>>> > Can you help in finding out what I could be doing wrong?
>>> >
>>> >
>>> > SLF4J: Class path contains multiple SLF4J bindings.
>>> > SLF4J: Found binding in
>>> >
>>> [jar:file:/home/prashant/.ivy2/cache/org.slf4j/slf4j-log4j12/jars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> > SLF4J: Found binding in
>>> >
>>> [jar:file:/home/prashant/workspaceNutchTrunk/trunk/build/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>>> > explanation.
>>> > solrUrl is not set, indexing will be skipped...
>>> > crawl started in: crawl
>>> > rootUrlDir = urls
>>> > threads = 10
>>> > depth = 3
>>> > solrUrl=null
>>> > topN = 50
>>> > Injector: starting at 2012-12-02 17:25:13
>>> > Injector: crawlDb: crawl/crawldb
>>> > Injector: urlDir: urls
>>> > Injector: Converting injected urls to crawl db entries.
>>> > Injector: total number of urls rejected by filters: 0
>>> > Injector: total number of urls injected after normalization and
>>> filtering: 0
>>> > Injector: Merging injected urls into crawl db.
>>> > Injector: finished at 2012-12-02 17:25:28, elapsed: 00:00:14
>>> > Generator: starting at 2012-12-02 17:25:28
>>> > Generator: Selecting best-scoring urls due for fetch.
>>> > Generator: filtering: true
>>> > Generator: normalizing: true
>>> > Generator: topN: 50
>>> > Generator: jobtracker is 'local', generating exactly one partition.
>>> > Exception in thread "main" java.io.IOException: Job failed!
>>> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
>>> > at org.apache.nutch.crawl.Generator.generate(Generator.java:551)
>>> > at org.apache.nutch.crawl.Generator.generate(Generator.java:456)
>>> > at org.apache.nutch.crawl.Crawl.run(Crawl.java:130)
>>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>>> >
>>>
>>
>>



-- 
Lewis

Re: Local Trunk Build - java.io.IOException: Job failed!

Posted by Prashant Ladha <pr...@gmail.com>.

I found a possible solution. I ended up modifying the nutch-default.xml
file to hard code the plugins.folder path.[0]
If everyone ends up doing the same thing, then we should add it it in
installation guide.[1]

[0]
<property>
  <name>plugin.folders</name>
  <!-- <value>plugins</value> -->
*  <value>/home/prashant/workspaceNutchTrunk/trunk/build/plugins</value>*
  <description>Directories where nutch plugins are located.  Each
  element may be a relative or absolute path.  If absolute, it is used
  as is.  If relative, it is searched for on the classpath.</description>
</property>

This was already discussed in Nutch JIRA portal.
https://issues.apache.org/jira/browse/NUTCH-937

[1]
http://wiki.apache.org/nutch/RunNutchInEclipse


On Sun, Dec 2, 2012 at 5:47 PM, Prashant Ladha <pr...@gmail.com>wrote:

> Hi Markus.
> After sending the email, I again went through the instructions on [0]
> link.
> Then I saw the instruction to look at hadoop.log file so looking at the
> log file, I found [1]
> Are there any native Hadoop libraries that we have to install?
>
> I am on Ubuntu 12.1, JDK 1.7 & Trunk Nutch.
>
> [0] http://wiki.apache.org/nutch/RunNutchInEclipse
> [1] attached hadoop.log
>
>
> On Sun, Dec 2, 2012 at 5:45 PM, Markus Jelsma <ma...@openindex.io>wrote:
>
>> hi - Please provide log output and version number.
>>
>> -----Original message-----
>> > From:Prashant Ladha <pr...@gmail.com>
>> > Sent: Sun 02-Dec-2012 23:37
>> > To: user@nutch.apache.org
>> > Subject: Local Trunk Build - java.io.IOException: Job failed!
>> >
>> > Hi,
>> > Earlier, since I was on Windows7 and seeing some exception that nobody
>> saw
>> > so I moved to Ubuntu.
>> > But here, I am seeing the below error message:
>> > Can you help in finding out what I could be doing wrong?
>> >
>> >
>> > SLF4J: Class path contains multiple SLF4J bindings.
>> > SLF4J: Found binding in
>> >
>> [jar:file:/home/prashant/.ivy2/cache/org.slf4j/slf4j-log4j12/jars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > SLF4J: Found binding in
>> >
>> [jar:file:/home/prashant/workspaceNutchTrunk/trunk/build/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> > explanation.
>> > solrUrl is not set, indexing will be skipped...
>> > crawl started in: crawl
>> > rootUrlDir = urls
>> > threads = 10
>> > depth = 3
>> > solrUrl=null
>> > topN = 50
>> > Injector: starting at 2012-12-02 17:25:13
>> > Injector: crawlDb: crawl/crawldb
>> > Injector: urlDir: urls
>> > Injector: Converting injected urls to crawl db entries.
>> > Injector: total number of urls rejected by filters: 0
>> > Injector: total number of urls injected after normalization and
>> filtering: 0
>> > Injector: Merging injected urls into crawl db.
>> > Injector: finished at 2012-12-02 17:25:28, elapsed: 00:00:14
>> > Generator: starting at 2012-12-02 17:25:28
>> > Generator: Selecting best-scoring urls due for fetch.
>> > Generator: filtering: true
>> > Generator: normalizing: true
>> > Generator: topN: 50
>> > Generator: jobtracker is 'local', generating exactly one partition.
>> > Exception in thread "main" java.io.IOException: Job failed!
>> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
>> > at org.apache.nutch.crawl.Generator.generate(Generator.java:551)
>> > at org.apache.nutch.crawl.Generator.generate(Generator.java:456)
>> > at org.apache.nutch.crawl.Crawl.run(Crawl.java:130)
>> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>> >
>>
>
>

Re: Local Trunk Build - java.io.IOException: Job failed!

Posted by Prashant Ladha <pr...@gmail.com>.

Hi Markus.
After sending the email, I again went through the instructions on [0] link.
Then I saw the instruction to look at hadoop.log file so looking at the log
file, I found [1]
Are there any native Hadoop libraries that we have to install?

I am on Ubuntu 12.1, JDK 1.7 & Trunk Nutch.

[0] http://wiki.apache.org/nutch/RunNutchInEclipse
[1] attached hadoop.log


On Sun, Dec 2, 2012 at 5:45 PM, Markus Jelsma <ma...@openindex.io>wrote:

> hi - Please provide log output and version number.
>
> -----Original message-----
> > From:Prashant Ladha <pr...@gmail.com>
> > Sent: Sun 02-Dec-2012 23:37
> > To: user@nutch.apache.org
> > Subject: Local Trunk Build - java.io.IOException: Job failed!
> >
> > Hi,
> > Earlier, since I was on Windows7 and seeing some exception that nobody
> saw
> > so I moved to Ubuntu.
> > But here, I am seeing the below error message:
> > Can you help in finding out what I could be doing wrong?
> >
> >
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> >
> [jar:file:/home/prashant/.ivy2/cache/org.slf4j/slf4j-log4j12/jars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> >
> [jar:file:/home/prashant/workspaceNutchTrunk/trunk/build/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > solrUrl is not set, indexing will be skipped...
> > crawl started in: crawl
> > rootUrlDir = urls
> > threads = 10
> > depth = 3
> > solrUrl=null
> > topN = 50
> > Injector: starting at 2012-12-02 17:25:13
> > Injector: crawlDb: crawl/crawldb
> > Injector: urlDir: urls
> > Injector: Converting injected urls to crawl db entries.
> > Injector: total number of urls rejected by filters: 0
> > Injector: total number of urls injected after normalization and
> filtering: 0
> > Injector: Merging injected urls into crawl db.
> > Injector: finished at 2012-12-02 17:25:28, elapsed: 00:00:14
> > Generator: starting at 2012-12-02 17:25:28
> > Generator: Selecting best-scoring urls due for fetch.
> > Generator: filtering: true
> > Generator: normalizing: true
> > Generator: topN: 50
> > Generator: jobtracker is 'local', generating exactly one partition.
> > Exception in thread "main" java.io.IOException: Job failed!
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
> > at org.apache.nutch.crawl.Generator.generate(Generator.java:551)
> > at org.apache.nutch.crawl.Generator.generate(Generator.java:456)
> > at org.apache.nutch.crawl.Crawl.run(Crawl.java:130)
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
> >
>

RE: Local Trunk Build - java.io.IOException: Job failed!

Posted by Markus Jelsma <ma...@openindex.io>.

hi - Please provide log output and version number. 
 
-----Original message-----
> From:Prashant Ladha <pr...@gmail.com>
> Sent: Sun 02-Dec-2012 23:37
> To: user@nutch.apache.org
> Subject: Local Trunk Build - java.io.IOException: Job failed!
> 
> Hi,
> Earlier, since I was on Windows7 and seeing some exception that nobody saw
> so I moved to Ubuntu.
> But here, I am seeing the below error message:
> Can you help in finding out what I could be doing wrong?
> 
> 
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/prashant/.ivy2/cache/org.slf4j/slf4j-log4j12/jars/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/prashant/workspaceNutchTrunk/trunk/build/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> solrUrl is not set, indexing will be skipped...
> crawl started in: crawl
> rootUrlDir = urls
> threads = 10
> depth = 3
> solrUrl=null
> topN = 50
> Injector: starting at 2012-12-02 17:25:13
> Injector: crawlDb: crawl/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: total number of urls rejected by filters: 0
> Injector: total number of urls injected after normalization and filtering: 0
> Injector: Merging injected urls into crawl db.
> Injector: finished at 2012-12-02 17:25:28, elapsed: 00:00:14
> Generator: starting at 2012-12-02 17:25:28
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: true
> Generator: normalizing: true
> Generator: topN: 50
> Generator: jobtracker is 'local', generating exactly one partition.
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
> at org.apache.nutch.crawl.Generator.generate(Generator.java:551)
> at org.apache.nutch.crawl.Generator.generate(Generator.java:456)
> at org.apache.nutch.crawl.Crawl.run(Crawl.java:130)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
>