You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by RP <rp...@earthlink.net> on 2006/12/15 17:32:20 UTC
Error on convert to 0.9 during mergesegs step
HELP - Error on migrate from 0.8 to 0.9 following the procedures
outlined in the Wiki. This is at the mergesegs step, crawldb convert
went fine - trying to deal with segments now as I'm on a slow connection
and would be painful to re-crawl. Anyone seen this or have any ideas on
how to get past this..?? Nutch on fedora, using Java 1.5 Initial
crawls on .9 seem to work just fine but I haven't tried a mergesegs on
the .9 fetched segments yet....
Also this one -2006-12-15 00:28:22,061 WARN util.NativeCodeLoader -
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable.
I can see how to build native library from Hadoop source but where does
it go in Nutch world - we've only got the Hadoop.jar in lib..?? Does
the native library bring any performance boost to the table..??
Std Out:
SegmentMerger: adding crawl/segments.old/20061209202120
SegmentMerger: adding crawl/segments.old/20061212194145
SegmentMerger: using segment data from: content crawl_generate crawl_fetch
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:399)
at
org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:547)
at
org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:595)
From Hadoop log:
2006-12-15 00:08:51,133 INFO segment.SegmentMerger - Merging 29
segments to crawl/converted8/20061215000851
2006-12-15 00:08:51,345 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-0
2006-12-15 00:08:51,350 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-1
2006-12-15 00:08:51,389 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-10
2006-12-15 00:08:51,390 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-11
2006-12-15 00:08:51,391 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-12
2006-12-15 00:08:51,392 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-13
2006-12-15 00:08:51,394 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-14
2006-12-15 00:08:51,395 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-15
2006-12-15 00:08:51,396 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-16
2006-12-15 00:08:51,397 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-17
2006-12-15 00:08:51,400 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-18
2006-12-15 00:08:51,401 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-19
2006-12-15 00:08:51,405 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-2
2006-12-15 00:08:51,407 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-20
2006-12-15 00:08:51,408 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-21
2006-12-15 00:08:51,409 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-22
2006-12-15 00:08:51,442 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-23
2006-12-15 00:08:51,444 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-24
2006-12-15 00:08:51,445 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-25
2006-12-15 00:08:51,453 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-26
2006-12-15 00:08:51,455 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-3
2006-12-15 00:08:51,456 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-4
2006-12-15 00:08:51,457 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-5
2006-12-15 00:08:51,458 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-6
2006-12-15 00:08:51,459 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-7
2006-12-15 00:08:51,500 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-8
2006-12-15 00:08:51,501 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061208184550-9
2006-12-15 00:08:51,502 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061209202120
2006-12-15 00:08:51,503 INFO segment.SegmentMerger - SegmentMerger:
adding crawl/segments.old/20061212194145
2006-12-15 00:08:51,505 INFO segment.SegmentMerger - SegmentMerger:
using segment data from: content crawl_generate crawl_fetch
2006-12-15 00:28:22,061 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
2006-12-15 00:52:43,895 WARN mapred.LocalJobRunner - job_dokmpz
java.lang.NullPointerException
at
org.apache.hadoop.fs.LocalFileSystem.reportChecksumFailure(LocalFileSystem.java:380)
at
org.apache.hadoop.fs.FSDataInputStream$Checker.verifySum(FSDataInputStream.java:136)
at
org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:114)
at
org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
at java.io.DataInputStream.read(DataInputStream.java:80)
at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:200)
at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:192)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:75)
at
org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:212)
at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:373)
at
org.apache.hadoop.mapred.PhasedFileSystem.commit(PhasedFileSystem.java:181)
at
org.apache.hadoop.mapred.PhasedFileSystem.commit(PhasedFileSystem.java:211)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:315)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:137)
--
rp
Re: Error on convert to 0.9 during mergesegs step
Posted by RP <rp...@earthlink.net>.
Andrzej - Thanks that conf switch seemed to take care of it...! Any
idea if the Hadoop native stuff will give any performance boost..??
rp
Andrzej Bialecki wrote:
> RP wrote:
>> Pardon my ignorance, but where and how do I do this..?? Nothing in
>> conf files that I can see as a switch....
>
>
> Ah-ha! you can't see it because it's a _hidden_ switch ... *evil grin*
>
> Joking aside ... Yes, sorry for the confusion - it's a setting in the
> default Hadoop config, and Nutch config contains only overrides ... so
> you need to put this explicitly into your hadoop-site.xml, like this:
>
> <property>
> <name>mapred.speculative.execution</name>
> <value>false</value>
> </property>
>
> If this fixes your problem, I'll put this property in the public sources.
>
Re: Error on convert to 0.9 during mergesegs step
Posted by karthik085 <ka...@gmail.com>.
Hmm..... Thanks.
Andrzej Bialecki wrote:
>
> karthik085 wrote:
>> I put the property in hadoop-site.xml and still get the same error. Any
>> other
>> thoughts would be helpful. I am converting from 0.7.2 to 0.9. Is it
>> because
>> of that?
>
> Convert tools support only migration from 0.8 to 0.9. Conversion from
> 0.7.x to 0.8 (or 0.9) is NOT supported.
>
> The recommended upgrade path is to dump your crawldb to a text file,
> extract urls, and inject them back to a 0.9-based system. Unfortunately,
> all other data is not (easily) upgradeable, so you will have to re-fetch
> everything.
>
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
>
--
View this message in context: http://www.nabble.com/Error-on-convert-to-0.9-during-mergesegs-step-tf2828063.html#a12141947
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Error on convert to 0.9 during mergesegs step
Posted by Andrzej Bialecki <ab...@getopt.org>.
karthik085 wrote:
> I put the property in hadoop-site.xml and still get the same error. Any other
> thoughts would be helpful. I am converting from 0.7.2 to 0.9. Is it because
> of that?
Convert tools support only migration from 0.8 to 0.9. Conversion from
0.7.x to 0.8 (or 0.9) is NOT supported.
The recommended upgrade path is to dump your crawldb to a text file,
extract urls, and inject them back to a 0.9-based system. Unfortunately,
all other data is not (easily) upgradeable, so you will have to re-fetch
everything.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Error on convert to 0.9 during mergesegs step
Posted by karthik085 <ka...@gmail.com>.
I put the property in hadoop-site.xml and still get the same error. Any other
thoughts would be helpful. I am converting from 0.7.2 to 0.9. Is it because
of that?
Andrzej Bialecki wrote:
>
> RP wrote:
>> Pardon my ignorance, but where and how do I do this..?? Nothing in
>> conf files that I can see as a switch....
>
>
> Ah-ha! you can't see it because it's a _hidden_ switch ... *evil grin*
>
> Joking aside ... Yes, sorry for the confusion - it's a setting in the
> default Hadoop config, and Nutch config contains only overrides ... so
> you need to put this explicitly into your hadoop-site.xml, like this:
>
> <property>
> <name>mapred.speculative.execution</name>
> <value>false</value>
> </property>
>
> If this fixes your problem, I'll put this property in the public sources.
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
>
>
--
View this message in context: http://www.nabble.com/Error-on-convert-to-0.9-during-mergesegs-step-tf2828063.html#a12138519
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Error on convert to 0.9 during mergesegs step
Posted by Andrzej Bialecki <ab...@getopt.org>.
RP wrote:
> Pardon my ignorance, but where and how do I do this..?? Nothing in
> conf files that I can see as a switch....
Ah-ha! you can't see it because it's a _hidden_ switch ... *evil grin*
Joking aside ... Yes, sorry for the confusion - it's a setting in the
default Hadoop config, and Nutch config contains only overrides ... so
you need to put this explicitly into your hadoop-site.xml, like this:
<property>
<name>mapred.speculative.execution</name>
<value>false</value>
</property>
If this fixes your problem, I'll put this property in the public sources.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Error on convert to 0.9 during mergesegs step
Posted by RP <rp...@earthlink.net>.
Pardon my ignorance, but where and how do I do this..?? Nothing in conf
files that I can see as a switch....
rp
Andrzej Bialecki wrote:
> RP wrote:
>> 2006-12-15 00:52:43,895 WARN mapred.LocalJobRunner - job_dokmpz
>> java.lang.NullPointerException
>> at
>> org.apache.hadoop.fs.LocalFileSystem.reportChecksumFailure(LocalFileSystem.java:380)
>>
>> at
>> org.apache.hadoop.fs.FSDataInputStream$Checker.verifySum(FSDataInputStream.java:136)
>>
>> at
>> org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:114)
>>
>> at
>> org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189)
>>
>> at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
>> at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
>> at java.io.DataInputStream.read(DataInputStream.java:80)
>> at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:200)
>> at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:192)
>> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:75)
>> at
>> org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:212)
>> at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:373)
>> at
>> org.apache.hadoop.mapred.PhasedFileSystem.commit(PhasedFileSystem.java:181)
>>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Please set mapred.speculative.execution to false, and repeat.
>
--
rp
Richard S. Petzko
rpetzko@earthlink.net
310-503-8740 cell
Re: Error on convert to 0.9 during mergesegs step
Posted by Andrzej Bialecki <ab...@getopt.org>.
RP wrote:
> 2006-12-15 00:52:43,895 WARN mapred.LocalJobRunner - job_dokmpz
> java.lang.NullPointerException
> at
> org.apache.hadoop.fs.LocalFileSystem.reportChecksumFailure(LocalFileSystem.java:380)
>
> at
> org.apache.hadoop.fs.FSDataInputStream$Checker.verifySum(FSDataInputStream.java:136)
>
> at
> org.apache.hadoop.fs.FSDataInputStream$Checker.read(FSDataInputStream.java:114)
>
> at
> org.apache.hadoop.fs.FSDataInputStream$PositionCache.read(FSDataInputStream.java:189)
>
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:254)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:313)
> at java.io.DataInputStream.read(DataInputStream.java:80)
> at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:200)
> at org.apache.hadoop.fs.FileUtil.copyContent(FileUtil.java:192)
> at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:75)
> at
> org.apache.hadoop.fs.LocalFileSystem.renameRaw(LocalFileSystem.java:212)
> at org.apache.hadoop.fs.FileSystem.rename(FileSystem.java:373)
> at
> org.apache.hadoop.mapred.PhasedFileSystem.commit(PhasedFileSystem.java:181)
>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Please set mapred.speculative.execution to false, and repeat.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com