You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vangelis karv <ka...@hotmail.com> on 2014/03/20 09:59:27 UTC

Java Heap Space error

Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN 120000. In the middle of the 5th depth I got this error:

2014-03-19 19:16:11,608 WARN  fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:11,608 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000 (queue crawl delay=0ms)
2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error: 
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:24,677 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21 (queue crawl delay=0ms)
2014-03-19 19:16:24,677 WARN  fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error: 
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:35,568 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187 (queue crawl delay=0ms)
2014-03-19 19:16:35,568 WARN  fetcher.FetcherJob - fetch of http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error: 
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:43,535 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928 (queue crawl delay=0ms)
2014-03-19 19:16:43,535 WARN  fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error: 
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:50,888 WARN  fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:51,580 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235 (queue crawl delay=0ms)
2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error: 
2014-03-19 19:16:53,711 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21 (queue crawl delay=0ms)
2014-03-19 19:16:54,659 INFO  fetcher.FetcherJob - -finishing thread FetcherThread20, activeThreads=46
2014-03-19 19:17:06,734 INFO  fetcher.FetcherJob - -finishing thread FetcherThread48, activeThreads=44
2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error: 
java.lang.OutOfMemoryError: Java heap space

As you can see, I have problems with the Java heap space. I ran this crawl using Nutch 2.2.1, Eclipse and MySQL.

Any ideas on how to solve this thing? 
Recently, I changed metadata field from blob to longblob and put http.content.limit to -1 (None of them caused any trouble so far though).
 		 	   		  

RE: Java Heap Space error

Posted by Vangelis karv <ka...@hotmail.com>.
Thanks for your answer Remi!
That is another issue. I run my crawls through eclipse and not through the standard script. I changed Run Configurations and added in Arguments tab/ VM Arguments this : -Xms512M -Xmx2048M.

> Date: Fri, 21 Mar 2014 17:12:21 +0800
> Subject: Re: Java Heap Space error
> From: tassingremi@gmail.com
> To: user@nutch.apache.org
> 
> Hi,
> 
> JAVA_HEAP_MAX value can be modified in the bin/nutch script
> 
> Remi
> 
> 
> On Thu, Mar 20, 2014 at 11:11 PM, Vangelis karv <ka...@hotmail.com>wrote:
> 
> > I managed to crawl again but I have something else now:
> >
> > https://www.dropbox.com/s/853xf1evi8sb51v/error .
> >
> > Also, I found this :
> > 2014-03-20 14:04:33,885 INFO  mapreduce.GoraRecordWriter - Flushing the
> > datastore after 20000 records.
> >
> > Thank you in advance!
> >
> > From: karvounis_b@hotmail.com
> > To: user@nutch.apache.org
> > Subject: Java Heap Space error
> > Date: Thu, 20 Mar 2014 10:59:27 +0200
> >
> >
> >
> >
> > Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN
> > 120000. In the middle of the 5th depth I got this error:
> >
> > 2014-03-19 19:16:11,608 WARN  fetcher.FetcherJob - fetch of
> > http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716failed with: java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:11,608 INFO  fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000(queue crawl delay=0ms)
> > 2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error:
> > java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:24,677 INFO  fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21(queue crawl delay=0ms)
> > 2014-03-19 19:16:24,677 WARN  fetcher.FetcherJob - fetch of
> > http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000failed with: java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error:
> > java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:35,568 INFO  fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187(queue crawl delay=0ms)
> > 2014-03-19 19:16:35,568 WARN  fetcher.FetcherJob - fetch of
> > http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21failed with: java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error:
> > java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:43,535 INFO  fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928(queue crawl delay=0ms)
> > 2014-03-19 19:16:43,535 WARN  fetcher.FetcherJob - fetch of
> > http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187failed with: java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error:
> > java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:50,888 WARN  fetcher.FetcherJob - fetch of
> > http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928failed with: java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:51,580 INFO  fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235(queue crawl delay=0ms)
> > 2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error:
> > 2014-03-19 19:16:53,711 INFO  fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21(queue crawl delay=0ms)
> > 2014-03-19 19:16:54,659 INFO  fetcher.FetcherJob - -finishing thread
> > FetcherThread20, activeThreads=46
> > 2014-03-19 19:17:06,734 INFO  fetcher.FetcherJob - -finishing thread
> > FetcherThread48, activeThreads=44
> > 2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error:
> > java.lang.OutOfMemoryError: Java heap space
> >
> > As you can see, I have problems with the Java heap space. I ran this crawl
> > using Nutch 2.2.1, Eclipse and MySQL.
> >
> > Any ideas on how to solve this thing?
> > Recently, I changed metadata field from blob to longblob and put
> > http.content.limit to -1 (None of them caused any trouble so far though).
> >
> >
> >
 		 	   		  

Re: Java Heap Space error

Posted by remi tassing <ta...@gmail.com>.
Hi,

JAVA_HEAP_MAX value can be modified in the bin/nutch script

Remi


On Thu, Mar 20, 2014 at 11:11 PM, Vangelis karv <ka...@hotmail.com>wrote:

> I managed to crawl again but I have something else now:
>
> https://www.dropbox.com/s/853xf1evi8sb51v/error .
>
> Also, I found this :
> 2014-03-20 14:04:33,885 INFO  mapreduce.GoraRecordWriter - Flushing the
> datastore after 20000 records.
>
> Thank you in advance!
>
> From: karvounis_b@hotmail.com
> To: user@nutch.apache.org
> Subject: Java Heap Space error
> Date: Thu, 20 Mar 2014 10:59:27 +0200
>
>
>
>
> Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN
> 120000. In the middle of the 5th depth I got this error:
>
> 2014-03-19 19:16:11,608 WARN  fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716failed with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:11,608 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000(queue crawl delay=0ms)
> 2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:24,677 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21(queue crawl delay=0ms)
> 2014-03-19 19:16:24,677 WARN  fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000failed with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:35,568 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187(queue crawl delay=0ms)
> 2014-03-19 19:16:35,568 WARN  fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21failed with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:43,535 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928(queue crawl delay=0ms)
> 2014-03-19 19:16:43,535 WARN  fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187failed with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:50,888 WARN  fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928failed with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:51,580 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235(queue crawl delay=0ms)
> 2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error:
> 2014-03-19 19:16:53,711 INFO  fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21(queue crawl delay=0ms)
> 2014-03-19 19:16:54,659 INFO  fetcher.FetcherJob - -finishing thread
> FetcherThread20, activeThreads=46
> 2014-03-19 19:17:06,734 INFO  fetcher.FetcherJob - -finishing thread
> FetcherThread48, activeThreads=44
> 2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
>
> As you can see, I have problems with the Java heap space. I ran this crawl
> using Nutch 2.2.1, Eclipse and MySQL.
>
> Any ideas on how to solve this thing?
> Recently, I changed metadata field from blob to longblob and put
> http.content.limit to -1 (None of them caused any trouble so far though).
>
>
>

RE: Java Heap Space error

Posted by Vangelis karv <ka...@hotmail.com>.
I managed to crawl again but I have something else now: 

https://www.dropbox.com/s/853xf1evi8sb51v/error . 

Also, I found this :
2014-03-20 14:04:33,885 INFO  mapreduce.GoraRecordWriter - Flushing the datastore after 20000 records.

Thank you in advance!

From: karvounis_b@hotmail.com
To: user@nutch.apache.org
Subject: Java Heap Space error
Date: Thu, 20 Mar 2014 10:59:27 +0200




Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN 120000. In the middle of the 5th depth I got this error:

2014-03-19 19:16:11,608 WARN  fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:11,608 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000 (queue crawl delay=0ms)
2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error: 
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:24,677 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21 (queue crawl delay=0ms)
2014-03-19 19:16:24,677 WARN  fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error: 
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:35,568 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187 (queue crawl delay=0ms)
2014-03-19 19:16:35,568 WARN  fetcher.FetcherJob - fetch of http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error: 
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:43,535 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928 (queue crawl delay=0ms)
2014-03-19 19:16:43,535 WARN  fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error: 
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:50,888 WARN  fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:51,580 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235 (queue crawl delay=0ms)
2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error: 
2014-03-19 19:16:53,711 INFO  fetcher.FetcherJob - fetching http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21 (queue crawl delay=0ms)
2014-03-19 19:16:54,659 INFO  fetcher.FetcherJob - -finishing thread FetcherThread20, activeThreads=46
2014-03-19 19:17:06,734 INFO  fetcher.FetcherJob - -finishing thread FetcherThread48, activeThreads=44
2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error: 
java.lang.OutOfMemoryError: Java heap space

As you can see, I have problems with the Java heap space. I ran this crawl using Nutch 2.2.1, Eclipse and MySQL.

Any ideas on how to solve this thing? 
Recently, I changed metadata field from blob to longblob and put http.content.limit to -1 (None of them caused any trouble so far though).