You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vangelis karv <ka...@hotmail.com> on 2014/03/20 09:59:27 UTC
Java Heap Space error
Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN 120000. In the middle of the 5th depth I got this error:
2014-03-19 19:16:11,608 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:11,608 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000 (queue crawl delay=0ms)
2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error:
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:24,677 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21 (queue crawl delay=0ms)
2014-03-19 19:16:24,677 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error:
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:35,568 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187 (queue crawl delay=0ms)
2014-03-19 19:16:35,568 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error:
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:43,535 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928 (queue crawl delay=0ms)
2014-03-19 19:16:43,535 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error:
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:50,888 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:51,580 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235 (queue crawl delay=0ms)
2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error:
2014-03-19 19:16:53,711 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21 (queue crawl delay=0ms)
2014-03-19 19:16:54,659 INFO fetcher.FetcherJob - -finishing thread FetcherThread20, activeThreads=46
2014-03-19 19:17:06,734 INFO fetcher.FetcherJob - -finishing thread FetcherThread48, activeThreads=44
2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error:
java.lang.OutOfMemoryError: Java heap space
As you can see, I have problems with the Java heap space. I ran this crawl using Nutch 2.2.1, Eclipse and MySQL.
Any ideas on how to solve this thing?
Recently, I changed metadata field from blob to longblob and put http.content.limit to -1 (None of them caused any trouble so far though).
RE: Java Heap Space error
Posted by Vangelis karv <ka...@hotmail.com>.
Thanks for your answer Remi!
That is another issue. I run my crawls through eclipse and not through the standard script. I changed Run Configurations and added in Arguments tab/ VM Arguments this : -Xms512M -Xmx2048M.
> Date: Fri, 21 Mar 2014 17:12:21 +0800
> Subject: Re: Java Heap Space error
> From: tassingremi@gmail.com
> To: user@nutch.apache.org
>
> Hi,
>
> JAVA_HEAP_MAX value can be modified in the bin/nutch script
>
> Remi
>
>
> On Thu, Mar 20, 2014 at 11:11 PM, Vangelis karv <ka...@hotmail.com>wrote:
>
> > I managed to crawl again but I have something else now:
> >
> > https://www.dropbox.com/s/853xf1evi8sb51v/error .
> >
> > Also, I found this :
> > 2014-03-20 14:04:33,885 INFO mapreduce.GoraRecordWriter - Flushing the
> > datastore after 20000 records.
> >
> > Thank you in advance!
> >
> > From: karvounis_b@hotmail.com
> > To: user@nutch.apache.org
> > Subject: Java Heap Space error
> > Date: Thu, 20 Mar 2014 10:59:27 +0200
> >
> >
> >
> >
> > Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN
> > 120000. In the middle of the 5th depth I got this error:
> >
> > 2014-03-19 19:16:11,608 WARN fetcher.FetcherJob - fetch of
> > http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716failed with: java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:11,608 INFO fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000(queue crawl delay=0ms)
> > 2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error:
> > java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:24,677 INFO fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21(queue crawl delay=0ms)
> > 2014-03-19 19:16:24,677 WARN fetcher.FetcherJob - fetch of
> > http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000failed with: java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error:
> > java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:35,568 INFO fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187(queue crawl delay=0ms)
> > 2014-03-19 19:16:35,568 WARN fetcher.FetcherJob - fetch of
> > http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21failed with: java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error:
> > java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:43,535 INFO fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928(queue crawl delay=0ms)
> > 2014-03-19 19:16:43,535 WARN fetcher.FetcherJob - fetch of
> > http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187failed with: java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error:
> > java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:50,888 WARN fetcher.FetcherJob - fetch of
> > http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928failed with: java.lang.OutOfMemoryError: Java heap space
> > 2014-03-19 19:16:51,580 INFO fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235(queue crawl delay=0ms)
> > 2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error:
> > 2014-03-19 19:16:53,711 INFO fetcher.FetcherJob - fetching
> > http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21(queue crawl delay=0ms)
> > 2014-03-19 19:16:54,659 INFO fetcher.FetcherJob - -finishing thread
> > FetcherThread20, activeThreads=46
> > 2014-03-19 19:17:06,734 INFO fetcher.FetcherJob - -finishing thread
> > FetcherThread48, activeThreads=44
> > 2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error:
> > java.lang.OutOfMemoryError: Java heap space
> >
> > As you can see, I have problems with the Java heap space. I ran this crawl
> > using Nutch 2.2.1, Eclipse and MySQL.
> >
> > Any ideas on how to solve this thing?
> > Recently, I changed metadata field from blob to longblob and put
> > http.content.limit to -1 (None of them caused any trouble so far though).
> >
> >
> >
Re: Java Heap Space error
Posted by remi tassing <ta...@gmail.com>.
Hi,
JAVA_HEAP_MAX value can be modified in the bin/nutch script
Remi
On Thu, Mar 20, 2014 at 11:11 PM, Vangelis karv <ka...@hotmail.com>wrote:
> I managed to crawl again but I have something else now:
>
> https://www.dropbox.com/s/853xf1evi8sb51v/error .
>
> Also, I found this :
> 2014-03-20 14:04:33,885 INFO mapreduce.GoraRecordWriter - Flushing the
> datastore after 20000 records.
>
> Thank you in advance!
>
> From: karvounis_b@hotmail.com
> To: user@nutch.apache.org
> Subject: Java Heap Space error
> Date: Thu, 20 Mar 2014 10:59:27 +0200
>
>
>
>
> Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN
> 120000. In the middle of the 5th depth I got this error:
>
> 2014-03-19 19:16:11,608 WARN fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716failed with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:11,608 INFO fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000(queue crawl delay=0ms)
> 2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:24,677 INFO fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21(queue crawl delay=0ms)
> 2014-03-19 19:16:24,677 WARN fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000failed with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:35,568 INFO fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187(queue crawl delay=0ms)
> 2014-03-19 19:16:35,568 WARN fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21failed with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:43,535 INFO fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928(queue crawl delay=0ms)
> 2014-03-19 19:16:43,535 WARN fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187failed with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:50,888 WARN fetcher.FetcherJob - fetch of
> http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928failed with: java.lang.OutOfMemoryError: Java heap space
> 2014-03-19 19:16:51,580 INFO fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235(queue crawl delay=0ms)
> 2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error:
> 2014-03-19 19:16:53,711 INFO fetcher.FetcherJob - fetching
> http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21(queue crawl delay=0ms)
> 2014-03-19 19:16:54,659 INFO fetcher.FetcherJob - -finishing thread
> FetcherThread20, activeThreads=46
> 2014-03-19 19:17:06,734 INFO fetcher.FetcherJob - -finishing thread
> FetcherThread48, activeThreads=44
> 2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error:
> java.lang.OutOfMemoryError: Java heap space
>
> As you can see, I have problems with the Java heap space. I ran this crawl
> using Nutch 2.2.1, Eclipse and MySQL.
>
> Any ideas on how to solve this thing?
> Recently, I changed metadata field from blob to longblob and put
> http.content.limit to -1 (None of them caused any trouble so far though).
>
>
>
RE: Java Heap Space error
Posted by Vangelis karv <ka...@hotmail.com>.
I managed to crawl again but I have something else now:
https://www.dropbox.com/s/853xf1evi8sb51v/error .
Also, I found this :
2014-03-20 14:04:33,885 INFO mapreduce.GoraRecordWriter - Flushing the datastore after 20000 records.
Thank you in advance!
From: karvounis_b@hotmail.com
To: user@nutch.apache.org
Subject: Java Heap Space error
Date: Thu, 20 Mar 2014 10:59:27 +0200
Hello everybody! Yesterday, I tried to run a crawl at depth 5 and topN 120000. In the middle of the 5th depth I got this error:
2014-03-19 19:16:11,608 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-716 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:11,608 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000 (queue crawl delay=0ms)
2014-03-19 19:16:22,291 ERROR http.Http - Failed with the following error:
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:24,677 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21 (queue crawl delay=0ms)
2014-03-19 19:16:24,677 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/pollenalert/USCA9000 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:33,550 ERROR http.Http - Failed with the following error:
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:35,568 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187 (queue crawl delay=0ms)
2014-03-19 19:16:35,568 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/recreation/outdoors/fishing/29547:21 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:41,928 ERROR http.Http - Failed with the following error:
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:43,535 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928 (queue crawl delay=0ms)
2014-03-19 19:16:43,535 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/NV-allergen-1187 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:50,432 ERROR http.Http - Failed with the following error:
java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:50,888 WARN fetcher.FetcherJob - fetch of http://www.weather.com/outlook/health/allergies/common/allergens/OH-allergen-928 failed with: java.lang.OutOfMemoryError: Java heap space
2014-03-19 19:16:51,580 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/health/allergies/common/allergens/FL-allergen-235 (queue crawl delay=0ms)
2014-03-19 19:16:53,120 ERROR http.Http - Failed with the following error:
2014-03-19 19:16:53,711 INFO fetcher.FetcherJob - fetching http://www.weather.com/outlook/recreation/outdoors/fishing/27891:21 (queue crawl delay=0ms)
2014-03-19 19:16:54,659 INFO fetcher.FetcherJob - -finishing thread FetcherThread20, activeThreads=46
2014-03-19 19:17:06,734 INFO fetcher.FetcherJob - -finishing thread FetcherThread48, activeThreads=44
2014-03-19 19:17:08,348 ERROR http.Http - Failed with the following error:
java.lang.OutOfMemoryError: Java heap space
As you can see, I have problems with the Java heap space. I ran this crawl using Nutch 2.2.1, Eclipse and MySQL.
Any ideas on how to solve this thing?
Recently, I changed metadata field from blob to longblob and put http.content.limit to -1 (None of them caused any trouble so far though).