You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Daniel Clark <da...@verizon.net> on 2007/10/09 18:27:53 UTC
linkdb - Out of Memory Error
I received the following error during the linkdb stage of indexing. Has
anyone encountered this before? Is there a way of increasing memory for
this stage in config file? Is there a known linkdb memory leak problem?
2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
/user/daclark/crawl/segments/20071008185033
2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable
2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
java.lang.OutOfMemoryError: Java heap space
at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at org.apache.hadoop.io.Text.writeString(Text.java:399)
at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
~~~~~~~~~~~~~~~~~~~~~
Daniel Clark, President
DAC Systems, Inc.
(703) 403-0340
~~~~~~~~~~~~~~~~~~~~~
Re: linkdb - Out of Memory Error
Posted by Dennis Kubes <ku...@apache.org>.
You are welcome to send me your configuration files, nutch-site.xml,
hadoop-site.xml, and hadoop-env.sh file in your conf directory. I will
be happy to take a look and see if I can find something. 200,000 pages
should not cause Out of Memory errors. We routinely run with 50+M pages.
Also let me know your specs, OS, Ram on each box, swap space, current
load. I will see if I can spot what is happening.
Dennis Kubes
Sathyam Y wrote:
> Any other thoughts on Out of memory error in linkdb?
>
> Thanks.
>
>
> Sathyam Y <sa...@yahoo.com> wrote:
> Around 200,000 pages when it failed.
>
> Dennis Kubes wrote: How many pages are in your database?
>
> Dennis Kubes
>
> Sathyam Y wrote:
>> I am getting the same out of memory exception in linkdb. I have a configuration of 4 machines running Nutch0.9 trunk.
>>
>> Please let me know if you found a way to resolve this issue. All tasks (master and slaves) are running with -Xmx1000m option and I am reluctant to increase heap size further.
>>
>> Thanks.
>>
>> Dennis Kubes wrote:
>> Try setting your child opts to -Xmx512M or higher. This config variable
>> is found in the hadoop-default.xml. AFAIK there is no way to change the
>> memory options for a single stage.
>>
>> Dennis Kubes
>>
>> Daniel Clark wrote:
>>> I received the following error during the linkdb stage of indexing. Has
>>> anyone encountered this before? Is there a way of increasing memory for
>>> this stage in config file? Is there a known linkdb memory leak problem?
>>>
>>>
>>>
>>> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
>>>
>>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
>>>
>>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
>>>
>>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
>>>
>>> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
>>> /user/daclark/crawl/segments/20071008185033
>>>
>>> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>>
>>> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>>
>>> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> at
>>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
>>>
>>> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>>
>>> at org.apache.hadoop.io.Text.writeString(Text.java:399)
>>>
>>> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>>>
>>> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>>>
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>>>
>>> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>>>
>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>>
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>>>
>>> at
>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>>>
>>> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
>>> Job failed!
>>>
>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>>
>>> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>>
>>> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>>>
>>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>>>
>>> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~
>>>
>>> Daniel Clark, President
>>>
>>> DAC Systems, Inc.
>>>
>>> (703) 403-0340
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~
>>>
>>>
>>>
>>>
>>
>>
>> ---------------------------------
>> Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
>
>
>
> ---------------------------------
> Boardwalk for $500? In 2007? Ha!
> Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
Re: linkdb - Out of Memory Error
Posted by Sathyam Y <sa...@yahoo.com>.
Any other thoughts on Out of memory error in linkdb?
Thanks.
Sathyam Y <sa...@yahoo.com> wrote:
Around 200,000 pages when it failed.
Dennis Kubes wrote: How many pages are in your database?
Dennis Kubes
Sathyam Y wrote:
> I am getting the same out of memory exception in linkdb. I have a configuration of 4 machines running Nutch0.9 trunk.
>
> Please let me know if you found a way to resolve this issue. All tasks (master and slaves) are running with -Xmx1000m option and I am reluctant to increase heap size further.
>
> Thanks.
>
> Dennis Kubes wrote:
> Try setting your child opts to -Xmx512M or higher. This config variable
> is found in the hadoop-default.xml. AFAIK there is no way to change the
> memory options for a single stage.
>
> Dennis Kubes
>
> Daniel Clark wrote:
>> I received the following error during the linkdb stage of indexing. Has
>> anyone encountered this before? Is there a way of increasing memory for
>> this stage in config file? Is there a known linkdb memory leak problem?
>>
>>
>>
>> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
>>
>> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
>> /user/daclark/crawl/segments/20071008185033
>>
>> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
>>
>> java.lang.OutOfMemoryError: Java heap space
>>
>> at
>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
>>
>> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>
>> at org.apache.hadoop.io.Text.writeString(Text.java:399)
>>
>> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>>
>> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>>
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>>
>> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>>
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>>
>> at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>>
>> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
>> Job failed!
>>
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>
>> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>
>> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>>
>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>>
>> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>> Daniel Clark, President
>>
>> DAC Systems, Inc.
>>
>> (703) 403-0340
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>>
>>
>>
>
>
>
> ---------------------------------
> Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
---------------------------------
Boardwalk for $500? In 2007? Ha!
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
Re: linkdb - Out of Memory Error
Posted by Sathyam Y <sa...@yahoo.com>.
Around 200,000 pages when it failed.
Dennis Kubes <ku...@apache.org> wrote: How many pages are in your database?
Dennis Kubes
Sathyam Y wrote:
> I am getting the same out of memory exception in linkdb. I have a configuration of 4 machines running Nutch0.9 trunk.
>
> Please let me know if you found a way to resolve this issue. All tasks (master and slaves) are running with -Xmx1000m option and I am reluctant to increase heap size further.
>
> Thanks.
>
> Dennis Kubes wrote:
> Try setting your child opts to -Xmx512M or higher. This config variable
> is found in the hadoop-default.xml. AFAIK there is no way to change the
> memory options for a single stage.
>
> Dennis Kubes
>
> Daniel Clark wrote:
>> I received the following error during the linkdb stage of indexing. Has
>> anyone encountered this before? Is there a way of increasing memory for
>> this stage in config file? Is there a known linkdb memory leak problem?
>>
>>
>>
>> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
>>
>> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
>> /user/daclark/crawl/segments/20071008185033
>>
>> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
>>
>> java.lang.OutOfMemoryError: Java heap space
>>
>> at
>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
>>
>> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>
>> at org.apache.hadoop.io.Text.writeString(Text.java:399)
>>
>> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>>
>> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>>
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>>
>> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>>
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>>
>> at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>>
>> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
>> Job failed!
>>
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>
>> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>
>> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>>
>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>>
>> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>> Daniel Clark, President
>>
>> DAC Systems, Inc.
>>
>> (703) 403-0340
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>>
>>
>>
>
>
>
> ---------------------------------
> Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
---------------------------------
Boardwalk for $500? In 2007? Ha!
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
Re: linkdb - Out of Memory Error
Posted by Dennis Kubes <ku...@apache.org>.
How many pages are in your database?
Dennis Kubes
Sathyam Y wrote:
> I am getting the same out of memory exception in linkdb. I have a configuration of 4 machines running Nutch0.9 trunk.
>
> Please let me know if you found a way to resolve this issue. All tasks (master and slaves) are running with -Xmx1000m option and I am reluctant to increase heap size further.
>
> Thanks.
>
> Dennis Kubes <ku...@apache.org> wrote:
> Try setting your child opts to -Xmx512M or higher. This config variable
> is found in the hadoop-default.xml. AFAIK there is no way to change the
> memory options for a single stage.
>
> Dennis Kubes
>
> Daniel Clark wrote:
>> I received the following error during the linkdb stage of indexing. Has
>> anyone encountered this before? Is there a way of increasing memory for
>> this stage in config file? Is there a known linkdb memory leak problem?
>>
>>
>>
>> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
>>
>> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
>> /user/daclark/crawl/segments/20071008185033
>>
>> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
>>
>> java.lang.OutOfMemoryError: Java heap space
>>
>> at
>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
>>
>> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>
>> at org.apache.hadoop.io.Text.writeString(Text.java:399)
>>
>> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>>
>> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>>
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>>
>> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>>
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>>
>> at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>>
>> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
>> Job failed!
>>
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>
>> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>
>> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>>
>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>>
>> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>> Daniel Clark, President
>>
>> DAC Systems, Inc.
>>
>> (703) 403-0340
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>>
>>
>>
>
>
>
> ---------------------------------
> Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
Re: linkdb - Out of Memory Error
Posted by Sathyam Y <sa...@yahoo.com>.
I am getting the same out of memory exception in linkdb. I have a configuration of 4 machines running Nutch0.9 trunk.
Please let me know if you found a way to resolve this issue. All tasks (master and slaves) are running with -Xmx1000m option and I am reluctant to increase heap size further.
Thanks.
Dennis Kubes <ku...@apache.org> wrote:
Try setting your child opts to -Xmx512M or higher. This config variable
is found in the hadoop-default.xml. AFAIK there is no way to change the
memory options for a single stage.
Dennis Kubes
Daniel Clark wrote:
> I received the following error during the linkdb stage of indexing. Has
> anyone encountered this before? Is there a way of increasing memory for
> this stage in config file? Is there a known linkdb memory leak problem?
>
>
>
> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
>
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
>
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
>
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
>
> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
> /user/daclark/crawl/segments/20071008185033
>
> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
>
> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
>
> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
>
> java.lang.OutOfMemoryError: Java heap space
>
> at
> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
>
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>
> at org.apache.hadoop.io.Text.writeString(Text.java:399)
>
> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>
> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>
> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>
> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
> Job failed!
>
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>
> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>
> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>
> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>
> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>
>
>
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~
>
> Daniel Clark, President
>
> DAC Systems, Inc.
>
> (703) 403-0340
>
> ~~~~~~~~~~~~~~~~~~~~~
>
>
>
>
---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
Re: linkdb - Out of Memory Error
Posted by Dennis Kubes <ku...@apache.org>.
Try setting your child opts to -Xmx512M or higher. This config variable
is found in the hadoop-default.xml. AFAIK there is no way to change the
memory options for a single stage.
Dennis Kubes
Daniel Clark wrote:
> I received the following error during the linkdb stage of indexing. Has
> anyone encountered this before? Is there a way of increasing memory for
> this stage in config file? Is there a known linkdb memory leak problem?
>
>
>
> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
>
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
>
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
>
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
>
> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
> /user/daclark/crawl/segments/20071008185033
>
> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
>
> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
>
> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
>
> java.lang.OutOfMemoryError: Java heap space
>
> at
> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
>
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>
> at org.apache.hadoop.io.Text.writeString(Text.java:399)
>
> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>
> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>
> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>
> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
> Job failed!
>
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>
> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>
> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>
> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>
> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>
>
>
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~
>
> Daniel Clark, President
>
> DAC Systems, Inc.
>
> (703) 403-0340
>
> ~~~~~~~~~~~~~~~~~~~~~
>
>
>
>