You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vishal Sharma <vi...@grazitti.com> on 2015/07/06 07:54:48 UTC

Getting IO exception on crawl with Apache nutch 1.9

Hi,

I am getting following error when trying to crawl just one url:

./bin/crawl ./urls/ ./CrawlData/ "." 5

Injector: starting at 2015-07-06 05:52:45
Injector: crawlDb: CrawlData/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)


This used to work earlier. Can someone please help?






*Vishal Sharma*
*Team Leader, SFDC*T: +1 30­4 636 7373
E: vishals@grazitti.com
www.grazitti.com [image: Description: LinkedIn]
<http://www.linkedin.com/company/grazitti-interactive>[image: Description:
Twitter] <https://twitter.com/grazitti>[image: fbook]
<https://www.facebook.com/grazitti.interactive>

Re: Getting IO exception on crawl with Apache nutch 1.9

Posted by Imtiaz Shakil Siddique <sh...@gmail.com>.
Hi Vishal,

Nutch maintains a log file which can be located in
$Nutch_Home/logs/hadoop.log
These log file holds much detailed information of the crash.

Can you please post the log file so that we can inspect.
Thank you.

On 6 July 2015 at 11:54, Vishal Sharma <vi...@grazitti.com> wrote:

> Hi,
>
> I am getting following error when trying to crawl just one url:
>
> ./bin/crawl ./urls/ ./CrawlData/ "." 5
>
> Injector: starting at 2015-07-06 05:52:45
> Injector: crawlDb: CrawlData/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
>
>
> This used to work earlier. Can someone please help?
>
>
>
>
>
>
> *Vishal Sharma*
> *Team Leader, SFDC*T: +1 30­4 636 7373
> E: vishals@grazitti.com
> www.grazitti.com [image: Description: LinkedIn]
> <http://www.linkedin.com/company/grazitti-interactive>[image: Description:
> Twitter] <https://twitter.com/grazitti>[image: fbook]
> <https://www.facebook.com/grazitti.interactive>
>