You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by RP <rp...@earthlink.net> on 2006/12/15 17:32:20 UTC

Error on convert to 0.9 during mergesegs step

HELP - Error on migrate from 0.8 to 0.9 following the procedures 
outlined in the Wiki.  This is at the mergesegs step, crawldb convert 
went fine - trying to deal with segments now as I'm on a slow connection 
and would be painful to re-crawl.  Anyone seen this or have any ideas on 
how to get past this..??  Nutch on fedora, using Java 1.5  Initial 
crawls on .9 seem to work just fine but I haven't tried a mergesegs on 
the .9 fetched segments yet....

Also this one -2006-12-15 00:28:22,061 WARN  util.NativeCodeLoader - 
Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable.
I can see how to build native library from Hadoop source but where does 
it go in Nutch world - we've only got the Hadoop.jar in lib..??  Does 
the native library bring any performance boost to the table..??

Std Out:
SegmentMerger:   adding crawl/segments.old/20061209202120
SegmentMerger:   adding crawl/segments.old/20061212194145
SegmentMerger: using segment data from: content crawl_generate crawl_fetch
Exception in thread "main" java.io.IOException: Job failed!
       at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
       at 
org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:547)
       at 
org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:595)

 From Hadoop log:
2006-12-15 00:08:51,133 INFO  segment.SegmentMerger - Merging 29 
segments to crawl/converted8/20061215000851
2006-12-15 00:08:51,345 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-0
2006-12-15 00:08:51,350 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-1
2006-12-15 00:08:51,389 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-10
2006-12-15 00:08:51,390 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-11
2006-12-15 00:08:51,391 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-12
2006-12-15 00:08:51,392 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-13
2006-12-15 00:08:51,394 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-14
2006-12-15 00:08:51,395 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-15
2006-12-15 00:08:51,396 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-16
2006-12-15 00:08:51,397 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-17
2006-12-15 00:08:51,400 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-18
2006-12-15 00:08:51,401 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-19
2006-12-15 00:08:51,405 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-2
2006-12-15 00:08:51,407 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-20
2006-12-15 00:08:51,408 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-21
2006-12-15 00:08:51,409 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-22
2006-12-15 00:08:51,442 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-23
2006-12-15 00:08:51,444 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-24
2006-12-15 00:08:51,445 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-25
2006-12-15 00:08:51,453 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-26
2006-12-15 00:08:51,455 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-3
2006-12-15 00:08:51,456 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-4
2006-12-15 00:08:51,457 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-5
2006-12-15 00:08:51,458 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-6
2006-12-15 00:08:51,459 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-7
2006-12-15 00:08:51,500 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-8
2006-12-15 00:08:51,501 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061208184550-9
2006-12-15 00:08:51,502 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061209202120
2006-12-15 00:08:51,503 INFO  segment.SegmentMerger - SegmentMerger:   
adding crawl/segments.old/20061212194145
2006-12-15 00:08:51,505 INFO  segment.SegmentMerger - SegmentMerger: 
using segment data from: content crawl_generate crawl_fetch
2006-12-15 00:28:22,061 WARN  util.NativeCodeLoader - Unable to load 
native-hadoop library for your platform... using builtin-java classes 
where applicable
2006-12-15 00:52:43,895 WARN  mapred.LocalJobRunner - job_dokmpz
java.lang.NullPointerException
       at 
org.apache.hadoop.fs.LocalFileSystem.reportChecksumFailure(LocalFileSystem.java:380) 

       at 
org.apache.hadoop.fs.FSDataInputStream$Checker.verifySum(FSDataInputStream.java:136) 

       at 
org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:114) 

       at 
org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189) 

       at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
       at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
       at java.io.DataInputStream.read(DataInputStream.java:80)
       at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:200)
       at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:192)
       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:75)
       at 
org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:212)
       at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:373)
       at 
org.apache.hadoop.mapred.PhasedFileSystem.commit(PhasedFileSystem.java:181)
       at 
org.apache.hadoop.mapred.PhasedFileSystem.commit(PhasedFileSystem.java:211)
       at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:315)
       at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:137)

-- 
rp


Re: Error on convert to 0.9 during mergesegs step

Posted by RP <rp...@earthlink.net>.
Andrzej - Thanks that conf switch seemed to take care of it...!  Any 
idea if the Hadoop native stuff will give any performance boost..??

rp

Andrzej Bialecki wrote:
> RP wrote:
>> Pardon my ignorance, but where and how do I do this..??  Nothing in 
>> conf files that I can see as a switch....
>
>
> Ah-ha! you can't see it because it's a _hidden_ switch ... *evil grin*
>
> Joking aside ... Yes, sorry for the confusion - it's a setting in the 
> default Hadoop config, and Nutch config contains only overrides ... so 
> you need to put this explicitly into your hadoop-site.xml, like this:
>
> <property>
>    <name>mapred.speculative.execution</name>
>    <value>false</value>
> </property>
>
> If this fixes your problem, I'll put this property in the public sources.
>

Re: Error on convert to 0.9 during mergesegs step

Posted by karthik085 <ka...@gmail.com>.
Hmm..... Thanks.


Andrzej Bialecki wrote:
> 
> karthik085 wrote:
>> I put the property in hadoop-site.xml and still get the same error. Any
>> other
>> thoughts would be helpful. I am converting from 0.7.2 to 0.9. Is it
>> because
>> of that?
> 
> Convert tools support only migration from 0.8 to 0.9. Conversion from 
> 0.7.x to 0.8 (or 0.9) is NOT supported.
> 
> The recommended upgrade path is to dump your crawldb to a text file, 
> extract urls, and inject them back to a 0.9-based system. Unfortunately, 
> all other data is not (easily) upgradeable, so you will have to re-fetch 
> everything.
> 
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>   ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Error-on-convert-to-0.9-during-mergesegs-step-tf2828063.html#a12141947
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Error on convert to 0.9 during mergesegs step

Posted by Andrzej Bialecki <ab...@getopt.org>.
karthik085 wrote:
> I put the property in hadoop-site.xml and still get the same error. Any other
> thoughts would be helpful. I am converting from 0.7.2 to 0.9. Is it because
> of that?

Convert tools support only migration from 0.8 to 0.9. Conversion from 
0.7.x to 0.8 (or 0.9) is NOT supported.

The recommended upgrade path is to dump your crawldb to a text file, 
extract urls, and inject them back to a 0.9-based system. Unfortunately, 
all other data is not (easily) upgradeable, so you will have to re-fetch 
everything.


-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Re: Error on convert to 0.9 during mergesegs step

Posted by karthik085 <ka...@gmail.com>.
I put the property in hadoop-site.xml and still get the same error. Any other
thoughts would be helpful. I am converting from 0.7.2 to 0.9. Is it because
of that?


Andrzej Bialecki wrote:
> 
> RP wrote:
>> Pardon my ignorance, but where and how do I do this..??  Nothing in 
>> conf files that I can see as a switch....
> 
> 
> Ah-ha! you can't see it because it's a _hidden_ switch ... *evil grin*
> 
> Joking aside ... Yes, sorry for the confusion - it's a setting in the 
> default Hadoop config, and Nutch config contains only overrides ... so 
> you need to put this explicitly into your hadoop-site.xml, like this:
> 
> <property>
>     <name>mapred.speculative.execution</name>
>     <value>false</value>
> </property>
> 
> If this fixes your problem, I'll put this property in the public sources.
> 
> -- 
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Error-on-convert-to-0.9-during-mergesegs-step-tf2828063.html#a12138519
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Error on convert to 0.9 during mergesegs step

Posted by Andrzej Bialecki <ab...@getopt.org>.
RP wrote:
> Pardon my ignorance, but where and how do I do this..??  Nothing in 
> conf files that I can see as a switch....


Ah-ha! you can't see it because it's a _hidden_ switch ... *evil grin*

Joking aside ... Yes, sorry for the confusion - it's a setting in the 
default Hadoop config, and Nutch config contains only overrides ... so 
you need to put this explicitly into your hadoop-site.xml, like this:

<property>
    <name>mapred.speculative.execution</name>
    <value>false</value>
</property>

If this fixes your problem, I'll put this property in the public sources.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Error on convert to 0.9 during mergesegs step

Posted by RP <rp...@earthlink.net>.
Pardon my ignorance, but where and how do I do this..??  Nothing in conf 
files that I can see as a switch....

rp

Andrzej Bialecki wrote:
> RP wrote:
>> 2006-12-15 00:52:43,895 WARN  mapred.LocalJobRunner - job_dokmpz
>> java.lang.NullPointerException
>>       at 
>> org.apache.hadoop.fs.LocalFileSystem.reportChecksumFailure(LocalFileSystem.java:380) 
>>
>>       at 
>> org.apache.hadoop.fs.FSDataInputStream$Checker.verifySum(FSDataInputStream.java:136) 
>>
>>       at 
>> org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:114) 
>>
>>       at 
>> org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189) 
>>
>>       at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
>>       at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
>>       at java.io.DataInputStream.read(DataInputStream.java:80)
>>       at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:200)
>>       at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:192)
>>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:75)
>>       at 
>> org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:212)
>>       at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:373)
>>       at 
>> org.apache.hadoop.mapred.PhasedFileSystem.commit(PhasedFileSystem.java:181) 
>>
>                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Please set mapred.speculative.execution to false, and repeat.
>

-- 
rp

Richard S. Petzko
rpetzko@earthlink.net
310-503-8740 cell


Re: Error on convert to 0.9 during mergesegs step

Posted by Andrzej Bialecki <ab...@getopt.org>.
RP wrote:
> 2006-12-15 00:52:43,895 WARN  mapred.LocalJobRunner - job_dokmpz
> java.lang.NullPointerException
>       at 
> org.apache.hadoop.fs.LocalFileSystem.reportChecksumFailure(LocalFileSystem.java:380) 
>
>       at 
> org.apache.hadoop.fs.FSDataInputStream$Checker.verifySum(FSDataInputStream.java:136) 
>
>       at 
> org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:114) 
>
>       at 
> org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189) 
>
>       at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
>       at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
>       at java.io.DataInputStream.read(DataInputStream.java:80)
>       at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:200)
>       at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:192)
>       at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:75)
>       at 
> org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:212)
>       at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:373)
>       at 
> org.apache.hadoop.mapred.PhasedFileSystem.commit(PhasedFileSystem.java:181) 
>
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Please set mapred.speculative.execution to false, and repeat.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com