You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Daniel Clark <da...@verizon.net> on 2007/10/09 18:27:53 UTC

linkdb - Out of Memory Error

I received the following error during the linkdb stage of indexing.  Has
anyone encountered this before?  Is there a way of increasing memory for
this stage in config file?  Is there a known linkdb memory leak problem?

 

2007-10-09 10:56:37,787 INFO  crawl.LinkDb - LinkDb: starting

2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb: linkdb: crawl/linkdb

2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb: URL normalize: true

2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb: URL filter: true

2007-10-09 10:56:37,886 INFO  crawl.LinkDb - LinkDb: adding segment:
/user/daclark/crawl/segments/20071008185033

2007-10-09 10:56:39,977 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable

2007-10-09 10:56:42,495 WARN  util.NativeCodeLoader - Unable to load
native-hadoop library for your platform... using builtin-java classes where
applicable

2007-10-09 10:56:51,415 WARN  mapred.TaskTracker - Error running child

java.lang.OutOfMemoryError: Java heap space

        at
java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)

        at java.io.DataOutputStream.write(DataOutputStream.java:90)

        at org.apache.hadoop.io.Text.writeString(Text.java:399)

        at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)

        at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)

        at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)

        at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)

        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)

        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)

        at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)

2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
Job failed!

        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)

        at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)

        at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)

        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)

        at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)

 

 

 

~~~~~~~~~~~~~~~~~~~~~

Daniel Clark, President

DAC Systems, Inc.

(703) 403-0340

~~~~~~~~~~~~~~~~~~~~~

Re: linkdb - Out of Memory Error

Posted by Dennis Kubes <ku...@apache.org>.

You are welcome to send me your configuration files, nutch-site.xml, 
hadoop-site.xml, and hadoop-env.sh file in your conf directory.  I will 
be happy to take a look and see if I can find something.  200,000 pages 
should not cause Out of Memory errors.  We routinely run with 50+M pages.

Also let me know your specs, OS, Ram on each box, swap space, current 
load.  I will see if I can spot what is happening.

Dennis Kubes

Sathyam Y wrote:
> Any other thoughts on Out of memory error in linkdb?
>    
>   Thanks.
>   
> 
> Sathyam Y <sa...@yahoo.com> wrote:
>   Around 200,000 pages when it failed. 
> 
> Dennis Kubes wrote: How many pages are in your database?
> 
> Dennis Kubes
> 
> Sathyam Y wrote:
>> I am getting the same out of memory exception in linkdb. I have a configuration of 4 machines running Nutch0.9 trunk.
>>
>> Please let me know if you found a way to resolve this issue. All tasks (master and slaves) are running with -Xmx1000m option and I am reluctant to increase heap size further.
>>
>> Thanks.
>>
>> Dennis Kubes wrote:
>> Try setting your child opts to -Xmx512M or higher. This config variable 
>> is found in the hadoop-default.xml. AFAIK there is no way to change the 
>> memory options for a single stage.
>>
>> Dennis Kubes
>>
>> Daniel Clark wrote:
>>> I received the following error during the linkdb stage of indexing. Has
>>> anyone encountered this before? Is there a way of increasing memory for
>>> this stage in config file? Is there a known linkdb memory leak problem?
>>>
>>>
>>>
>>> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
>>>
>>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
>>>
>>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
>>>
>>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
>>>
>>> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
>>> /user/daclark/crawl/segments/20071008185033
>>>
>>> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>>
>>> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>>
>>> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
>>>
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> at
>>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
>>>
>>> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>>
>>> at org.apache.hadoop.io.Text.writeString(Text.java:399)
>>>
>>> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>>>
>>> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>>>
>>> at
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>>>
>>> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>>>
>>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>>
>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>>>
>>> at
>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>>>
>>> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
>>> Job failed!
>>>
>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>>
>>> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>>
>>> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>>>
>>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>>>
>>> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~
>>>
>>> Daniel Clark, President
>>>
>>> DAC Systems, Inc.
>>>
>>> (703) 403-0340
>>>
>>> ~~~~~~~~~~~~~~~~~~~~~
>>>
>>>
>>>
>>>
>>
>>
>> ---------------------------------
>> Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
> 
> 
> 
> ---------------------------------
> Boardwalk for $500? In 2007? Ha! 
> Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.
> 
>  __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com

Re: linkdb - Out of Memory Error

Posted by Sathyam Y <sa...@yahoo.com>.

Any other thoughts on Out of memory error in linkdb?
   
  Thanks.
  

Sathyam Y <sa...@yahoo.com> wrote:
  Around 200,000 pages when it failed. 

Dennis Kubes wrote: How many pages are in your database?

Dennis Kubes

Sathyam Y wrote:
> I am getting the same out of memory exception in linkdb. I have a configuration of 4 machines running Nutch0.9 trunk.
> 
> Please let me know if you found a way to resolve this issue. All tasks (master and slaves) are running with -Xmx1000m option and I am reluctant to increase heap size further.
> 
> Thanks.
> 
> Dennis Kubes wrote:
> Try setting your child opts to -Xmx512M or higher. This config variable 
> is found in the hadoop-default.xml. AFAIK there is no way to change the 
> memory options for a single stage.
> 
> Dennis Kubes
> 
> Daniel Clark wrote:
>> I received the following error during the linkdb stage of indexing. Has
>> anyone encountered this before? Is there a way of increasing memory for
>> this stage in config file? Is there a known linkdb memory leak problem?
>>
>>
>>
>> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
>>
>> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
>> /user/daclark/crawl/segments/20071008185033
>>
>> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
>>
>> java.lang.OutOfMemoryError: Java heap space
>>
>> at
>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
>>
>> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>
>> at org.apache.hadoop.io.Text.writeString(Text.java:399)
>>
>> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>>
>> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>>
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>>
>> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>>
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>>
>> at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>>
>> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
>> Job failed!
>>
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>
>> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>
>> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>>
>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>>
>> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>> Daniel Clark, President
>>
>> DAC Systems, Inc.
>>
>> (703) 403-0340
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>>
>>
>>
> 
> 
> 
> ---------------------------------
> Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.



---------------------------------
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.

 __________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: linkdb - Out of Memory Error

Posted by Sathyam Y <sa...@yahoo.com>.

Around 200,000 pages when it failed. 

Dennis Kubes <ku...@apache.org> wrote:  How many pages are in your database?

Dennis Kubes

Sathyam Y wrote:
> I am getting the same out of memory exception in linkdb. I have a configuration of 4 machines running Nutch0.9 trunk.
> 
> Please let me know if you found a way to resolve this issue. All tasks (master and slaves) are running with -Xmx1000m option and I am reluctant to increase heap size further.
> 
> Thanks.
> 
> Dennis Kubes wrote:
> Try setting your child opts to -Xmx512M or higher. This config variable 
> is found in the hadoop-default.xml. AFAIK there is no way to change the 
> memory options for a single stage.
> 
> Dennis Kubes
> 
> Daniel Clark wrote:
>> I received the following error during the linkdb stage of indexing. Has
>> anyone encountered this before? Is there a way of increasing memory for
>> this stage in config file? Is there a known linkdb memory leak problem?
>>
>>
>>
>> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
>>
>> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
>> /user/daclark/crawl/segments/20071008185033
>>
>> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
>>
>> java.lang.OutOfMemoryError: Java heap space
>>
>> at
>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
>>
>> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>
>> at org.apache.hadoop.io.Text.writeString(Text.java:399)
>>
>> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>>
>> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>>
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>>
>> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>>
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>>
>> at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>>
>> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
>> Job failed!
>>
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>
>> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>
>> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>>
>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>>
>> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>> Daniel Clark, President
>>
>> DAC Systems, Inc.
>>
>> (703) 403-0340
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>>
>>
>>
> 
> 
> 
> ---------------------------------
> Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.


       
---------------------------------
Boardwalk for $500? In 2007? Ha! 
Play Monopoly Here and Now (it's updated for today's economy) at Yahoo! Games.

Re: linkdb - Out of Memory Error

Posted by Dennis Kubes <ku...@apache.org>.

How many pages are in your database?

Dennis Kubes

Sathyam Y wrote:
> I am getting the same out of memory exception in linkdb. I have a configuration of 4 machines running Nutch0.9 trunk.
>    
>   Please let me know if you found a way to resolve this issue. All tasks (master and slaves) are running with -Xmx1000m option and I am reluctant to increase heap size further.
>    
>   Thanks.
> 
> Dennis Kubes <ku...@apache.org> wrote:
>   Try setting your child opts to -Xmx512M or higher. This config variable 
> is found in the hadoop-default.xml. AFAIK there is no way to change the 
> memory options for a single stage.
> 
> Dennis Kubes
> 
> Daniel Clark wrote:
>> I received the following error during the linkdb stage of indexing. Has
>> anyone encountered this before? Is there a way of increasing memory for
>> this stage in config file? Is there a known linkdb memory leak problem?
>>
>>
>>
>> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
>>
>> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
>>
>> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
>> /user/daclark/crawl/segments/20071008185033
>>
>> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
>>
>> java.lang.OutOfMemoryError: Java heap space
>>
>> at
>> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
>>
>> at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>
>> at org.apache.hadoop.io.Text.writeString(Text.java:399)
>>
>> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
>>
>> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
>>
>> at
>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
>>
>> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
>>
>> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
>>
>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>>
>> at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
>>
>> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
>> Job failed!
>>
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>>
>> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
>>
>> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
>>
>> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
>>
>> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
>>
>>
>>
>>
>>
>>
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>> Daniel Clark, President
>>
>> DAC Systems, Inc.
>>
>> (703) 403-0340
>>
>> ~~~~~~~~~~~~~~~~~~~~~
>>
>>
>>
>>
> 
> 
>        
> ---------------------------------
> Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.

Re: linkdb - Out of Memory Error

Posted by Sathyam Y <sa...@yahoo.com>.

I am getting the same out of memory exception in linkdb. I have a configuration of 4 machines running Nutch0.9 trunk.
   
  Please let me know if you found a way to resolve this issue. All tasks (master and slaves) are running with -Xmx1000m option and I am reluctant to increase heap size further.
   
  Thanks.

Dennis Kubes <ku...@apache.org> wrote:
  Try setting your child opts to -Xmx512M or higher. This config variable 
is found in the hadoop-default.xml. AFAIK there is no way to change the 
memory options for a single stage.

Dennis Kubes

Daniel Clark wrote:
> I received the following error during the linkdb stage of indexing. Has
> anyone encountered this before? Is there a way of increasing memory for
> this stage in config file? Is there a known linkdb memory leak problem?
> 
> 
> 
> 2007-10-09 10:56:37,787 INFO crawl.LinkDb - LinkDb: starting
> 
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
> 
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL normalize: true
> 
> 2007-10-09 10:56:37,788 INFO crawl.LinkDb - LinkDb: URL filter: true
> 
> 2007-10-09 10:56:37,886 INFO crawl.LinkDb - LinkDb: adding segment:
> /user/daclark/crawl/segments/20071008185033
> 
> 2007-10-09 10:56:39,977 WARN util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 
> 2007-10-09 10:56:42,495 WARN util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 
> 2007-10-09 10:56:51,415 WARN mapred.TaskTracker - Error running child
> 
> java.lang.OutOfMemoryError: Java heap space
> 
> at
> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
> 
> at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 
> at org.apache.hadoop.io.Text.writeString(Text.java:399)
> 
> at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
> 
> at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
> 
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
> 
> at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
> 
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> 
> at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
> 
> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
> Job failed!
> 
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
> 
> at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
> 
> at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
> 
> at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
> 
> at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
> 
> 
> 
> 
> 
> 
> 
> ~~~~~~~~~~~~~~~~~~~~~
> 
> Daniel Clark, President
> 
> DAC Systems, Inc.
> 
> (703) 403-0340
> 
> ~~~~~~~~~~~~~~~~~~~~~
> 
> 
> 
> 


       
---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.

Re: linkdb - Out of Memory Error

Posted by Dennis Kubes <ku...@apache.org>.

Try setting your child opts to -Xmx512M or higher.  This config variable 
is found in the hadoop-default.xml.  AFAIK there is no way to change the 
  memory options for a single stage.

Dennis Kubes

Daniel Clark wrote:
> I received the following error during the linkdb stage of indexing.  Has
> anyone encountered this before?  Is there a way of increasing memory for
> this stage in config file?  Is there a known linkdb memory leak problem?
> 
>  
> 
> 2007-10-09 10:56:37,787 INFO  crawl.LinkDb - LinkDb: starting
> 
> 2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb: linkdb: crawl/linkdb
> 
> 2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb: URL normalize: true
> 
> 2007-10-09 10:56:37,788 INFO  crawl.LinkDb - LinkDb: URL filter: true
> 
> 2007-10-09 10:56:37,886 INFO  crawl.LinkDb - LinkDb: adding segment:
> /user/daclark/crawl/segments/20071008185033
> 
> 2007-10-09 10:56:39,977 WARN  util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 
> 2007-10-09 10:56:42,495 WARN  util.NativeCodeLoader - Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> 
> 2007-10-09 10:56:51,415 WARN  mapred.TaskTracker - Error running child
> 
> java.lang.OutOfMemoryError: Java heap space
> 
>         at
> java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:95)
> 
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 
>         at org.apache.hadoop.io.Text.writeString(Text.java:399)
> 
>         at org.apache.nutch.crawl.Inlink.write(Inlink.java:48)
> 
>         at org.apache.nutch.crawl.Inlinks.write(Inlinks.java:54)
> 
>         at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:315)
> 
>         at org.apache.nutch.crawl.LinkDb.map(LinkDb.java:167)
> 
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
> 
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> 
>         at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1445)
> 
> 2007-10-09 10:57:40,654 FATAL crawl.LinkDb - LinkDb: java.io.IOException:
> Job failed!
> 
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
> 
>         at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:232)
> 
>         at org.apache.nutch.crawl.LinkDb.run(LinkDb.java:377)
> 
>         at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
> 
>         at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:333)
> 
>  
> 
>  
> 
>  
> 
> ~~~~~~~~~~~~~~~~~~~~~
> 
> Daniel Clark, President
> 
> DAC Systems, Inc.
> 
> (703) 403-0340
> 
> ~~~~~~~~~~~~~~~~~~~~~
> 
>  
> 
>