You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Jason Ma <ra...@gmail.com> on 2007/07/03 19:27:14 UTC

Indexing exits with Job Failed

I'm running Nutch on RedHat Linux with Java 1.6.0_01.  I have
successfully crawled and indexed smaller quantities of data in the
past.  However, after I tried to scale up the crawling, Nutch would
give an exception when indexing (the bottom of the log included
below).  Please let me know if there's more information I should
provide.

I'd be very grateful for any suggestions or advice you may have.

Thanks in advance,
Jason Ma

....

 Indexing [http://64.13.133.31/pics/up-VC2GQ0CA9QSHHGHM-s] with
analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
(null)
 Indexing [http://64.13.133.31/pics/up-VET16648L9TBU53B-s] with
analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
(null)
 Indexing [http://64.13.133.31/pics/up-VHIUOB6N8CVESR52-s] with
analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
(null)
 Indexing [http://64.13.133.31/pics/user_promo_mini.png] with analyzer
org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259 (null)
Optimizing index.
merging segments _73 (1 docs) _74 (1 docs) _75 (1 docs) _76 (1 docs)
_77 (1 docs) _78 (1 docs) _79 (1 docs) _7a (1 docs) _7b (1 docs) _7c
(1 docs) _7d (1 docs) _7e (1 docs) _7f (1 docs) _7g (1 docs) _7h (1
docs) _7i (1 docs) _7j (1 docs) _7k (1 docs) _7l (1 docs) _7m (1 docs)
_7n (1 docs) _7o (1 docs) _7p (1 docs) _7q (1 docs) into _7r (24 docs)
merging segments _1e (50 docs) _2t (50 docs) _48 (50 docs) _5n (50
docs) _72 (50 docs) _7r (24 docs) into _7s (274 docs)
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:313)

Re: Indexing exits with Job Failed

Posted by Dennis Kubes <ku...@apache.org>.

As Doğacan stated we would need to see the error in the log files.  But 
if you have crawled smaller quantities and are scaling up and only now 
is it failing, it may be an OutOfMemoryException, in which case you can 
change the mapred.child.java.opts in the hadoop-site.xml file to a 
higher value, say -Xmx512M (we have ours set for -Xmx1024M).

Dennis Kubes

Jason Ma wrote:
> I'm running Nutch on RedHat Linux with Java 1.6.0_01.  I have
> successfully crawled and indexed smaller quantities of data in the
> past.  However, after I tried to scale up the crawling, Nutch would
> give an exception when indexing (the bottom of the log included
> below).  Please let me know if there's more information I should
> provide.
> 
> I'd be very grateful for any suggestions or advice you may have.
> 
> Thanks in advance,
> Jason Ma
> 
> ....
> 
> Indexing [http://64.13.133.31/pics/up-VC2GQ0CA9QSHHGHM-s] with
> analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
> (null)
> Indexing [http://64.13.133.31/pics/up-VET16648L9TBU53B-s] with
> analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
> (null)
> Indexing [http://64.13.133.31/pics/up-VHIUOB6N8CVESR52-s] with
> analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
> (null)
> Indexing [http://64.13.133.31/pics/user_promo_mini.png] with analyzer
> org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259 (null)
> Optimizing index.
> merging segments _73 (1 docs) _74 (1 docs) _75 (1 docs) _76 (1 docs)
> _77 (1 docs) _78 (1 docs) _79 (1 docs) _7a (1 docs) _7b (1 docs) _7c
> (1 docs) _7d (1 docs) _7e (1 docs) _7f (1 docs) _7g (1 docs) _7h (1
> docs) _7i (1 docs) _7j (1 docs) _7k (1 docs) _7l (1 docs) _7m (1 docs)
> _7n (1 docs) _7o (1 docs) _7p (1 docs) _7q (1 docs) into _7r (24 docs)
> merging segments _1e (50 docs) _2t (50 docs) _48 (50 docs) _5n (50
> docs) _72 (50 docs) _7r (24 docs) into _7s (274 docs)
> Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
>        at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
>        at org.apache.nutch.indexer.Indexer.main(Indexer.java:313)

Re: Indexing exits with Job Failed

Posted by carmmello <ca...@globo.com>.

>From my experience, most of times if you restart your computer and rerun the 
comands (updatedb, invertlinks, index) you can get your job done, using the 
0.9 version.  However there were some nightly distributions (about the end 
of june) when the indexing phase was near impossible to acomplish.


----- Original Message ----- 
From: "Doğacan Güney" <do...@gmail.com>
To: <nu...@lucene.apache.org>
Sent: Monday, July 09, 2007 4:06 AM
Subject: Re: Indexing exits with Job Failed


> Hi,
>
> On 7/3/07, Jason Ma <ra...@gmail.com> wrote:
>> I'm running Nutch on RedHat Linux with Java 1.6.0_01.  I have
>> successfully crawled and indexed smaller quantities of data in the
>> past.  However, after I tried to scale up the crawling, Nutch would
>> give an exception when indexing (the bottom of the log included
>> below).  Please let me know if there's more information I should
>> provide.
>>
>> I'd be very grateful for any suggestions or advice you may have.
>>
>> Thanks in advance,
>> Jason Ma
>>
>> ....
>>
>>  Indexing [http://64.13.133.31/pics/up-VC2GQ0CA9QSHHGHM-s] with
>> analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
>> (null)
>>  Indexing [http://64.13.133.31/pics/up-VET16648L9TBU53B-s] with
>> analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
>> (null)
>>  Indexing [http://64.13.133.31/pics/up-VHIUOB6N8CVESR52-s] with
>> analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
>> (null)
>>  Indexing [http://64.13.133.31/pics/user_promo_mini.png] with analyzer
>> org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259 (null)
>> Optimizing index.
>> merging segments _73 (1 docs) _74 (1 docs) _75 (1 docs) _76 (1 docs)
>> _77 (1 docs) _78 (1 docs) _79 (1 docs) _7a (1 docs) _7b (1 docs) _7c
>> (1 docs) _7d (1 docs) _7e (1 docs) _7f (1 docs) _7g (1 docs) _7h (1
>> docs) _7i (1 docs) _7j (1 docs) _7k (1 docs) _7l (1 docs) _7m (1 docs)
>> _7n (1 docs) _7o (1 docs) _7p (1 docs) _7q (1 docs) into _7r (24 docs)
>> merging segments _1e (50 docs) _2t (50 docs) _48 (50 docs) _5n (50
>> docs) _72 (50 docs) _7r (24 docs) into _7s (274 docs)
>> Exception in thread "main" java.io.IOException: Job failed!
>>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
>>         at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
>>         at org.apache.nutch.indexer.Indexer.main(Indexer.java:313)
>>
>
> This exception is jobrunner telling us that your job has failed. This
> doesn't show us where the actual problem is. Check your
> logs/hadoop.log or your tasktracker's log files and you should see a
> more detailed log about your problem.
>
> -- 
> Doğacan Güney
>


--------------------------------------------------------------------------------


No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.476 / Virus Database: 269.10.2/891 - Release Date: 8/7/2007 
18:32

Re: Indexing exits with Job Failed

Posted by Doğacan Güney <do...@gmail.com>.

Hi,

On 7/3/07, Jason Ma <ra...@gmail.com> wrote:
> I'm running Nutch on RedHat Linux with Java 1.6.0_01.  I have
> successfully crawled and indexed smaller quantities of data in the
> past.  However, after I tried to scale up the crawling, Nutch would
> give an exception when indexing (the bottom of the log included
> below).  Please let me know if there's more information I should
> provide.
>
> I'd be very grateful for any suggestions or advice you may have.
>
> Thanks in advance,
> Jason Ma
>
> ....
>
>  Indexing [http://64.13.133.31/pics/up-VC2GQ0CA9QSHHGHM-s] with
> analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
> (null)
>  Indexing [http://64.13.133.31/pics/up-VET16648L9TBU53B-s] with
> analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
> (null)
>  Indexing [http://64.13.133.31/pics/up-VHIUOB6N8CVESR52-s] with
> analyzer org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259
> (null)
>  Indexing [http://64.13.133.31/pics/user_promo_mini.png] with analyzer
> org.apache.nutch.analysis.NutchDocumentAnalyzer@13b0259 (null)
> Optimizing index.
> merging segments _73 (1 docs) _74 (1 docs) _75 (1 docs) _76 (1 docs)
> _77 (1 docs) _78 (1 docs) _79 (1 docs) _7a (1 docs) _7b (1 docs) _7c
> (1 docs) _7d (1 docs) _7e (1 docs) _7f (1 docs) _7g (1 docs) _7h (1
> docs) _7i (1 docs) _7j (1 docs) _7k (1 docs) _7l (1 docs) _7m (1 docs)
> _7n (1 docs) _7o (1 docs) _7p (1 docs) _7q (1 docs) into _7r (24 docs)
> merging segments _1e (50 docs) _2t (50 docs) _48 (50 docs) _5n (50
> docs) _72 (50 docs) _7r (24 docs) into _7s (274 docs)
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
>         at org.apache.nutch.indexer.Indexer.index(Indexer.java:296)
>         at org.apache.nutch.indexer.Indexer.main(Indexer.java:313)
>

This exception is jobrunner telling us that your job has failed. This
doesn't show us where the actual problem is. Check your
logs/hadoop.log or your tasktracker's log files and you should see a
more detailed log about your problem.

-- 
Doğacan Güney