You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Bai Shen <ba...@gmail.com> on 2012/08/08 21:32:55 UTC

java.lang.OutOfMemoryError: GC overhead limit exceeded

Is this something other people are seeing?  I was parsing 10k urls when I
got this exception.  I'm running Nutch 2 head as of Aug 6 with the default
memory settings(1 GB).

Just wondering if anybody else has experienced this on Nutch 2.

Thanks.

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by al...@aim.com.

I was able to do jstack just before the program exited. The output is attached.



 

 

 

-----Original Message-----
From: alxsss <al...@aim.com>
To: user <us...@nutch.apache.org>
Sent: Sat, Aug 11, 2012 2:17 pm
Subject: Re: java.lang.OutOfMemoryError: GC overhead limit exceeded


Hello,



I am getting the same error and here is the log



2012-08-11 13:33:08,223 ERROR http.Http - Failed with the following error:

java.lang.OutOfMemoryError: Java heap space

        at java.util.Arrays.copyOf(Arrays.java:2271)

        at

java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)

        at

org.apache.nutch.protocol.http.HttpResponse.readPlainContent(HttpResponse.java:243)

        at

org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:161)

        at org.apache.nutch.protocol.http.Http.getResponse(Http.java:68)

        at

org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)

        at

org.apache.nutch.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:521)



Thanks.

Alex.







--

View this message in context: http://lucene.472066.n3.nabble.com/java-lang-OutOfMemoryError-GC-overhead-limit-exceeded-tp3999934p4000616.html

Sent from the Nutch - User mailing list archive at Nabble.com.

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by alxsss <al...@aim.com>.

Hello,

I am getting the same error and here is the log

2012-08-11 13:33:08,223 ERROR http.Http - Failed with the following error:
java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Arrays.java:2271)
        at
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)
        at
org.apache.nutch.protocol.http.HttpResponse.readPlainContent(HttpResponse.java:243)
        at
org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:161)
        at org.apache.nutch.protocol.http.Http.getResponse(Http.java:68)
        at
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:142)
        at
org.apache.nutch.fetcher.FetcherReducer$FetcherThread.run(FetcherReducer.java:521)

Thanks.
Alex.



--
View this message in context: http://lucene.472066.n3.nabble.com/java-lang-OutOfMemoryError-GC-overhead-limit-exceeded-tp3999934p4000616.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Bai Shen <ba...@gmail.com>.

It was crawling HTML files when it started throwing the exception.
Unfortunately, I didn't keep copies of the files or urls.

On Thu, Aug 9, 2012 at 3:07 AM, Ferdy Galema <fe...@kalooga.com>wrote:

> Hi,
>
> Of course setting a bigger heap sure helps, but most of the time only
> temporary. Can you see in the logs what type of documents are parsed?
>
> In case of html documents crawled on the wild web, a single document can
> cause the heap to explode. By default the cyberneko parser (in HtmlParser)
> is used for html documents. I hacked this library so that there are limits
> in the number of elements that are loaded during a parse. (I'm still trying
> to find a way to contribute this back into the codebase).
>
> Ferdy.
>
> On Wed, Aug 8, 2012 at 10:03 PM, Niccolò Becchi <niccolo.becchi@gmail.com
> >wrote:
>
> > If you are using Nutch in an hadoop cluster and you have enough memory
> try
> > with this parameters:
> >
> > <property>
> >     <name>mapred.child.java.opts</name>
> >     <value>-Xmx1600m -XX:-UseGCOverheadLimit
> > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/tmp</value>
> > </property>
> >
> > On Wed, Aug 8, 2012 at 9:32 PM, Bai Shen <ba...@gmail.com>
> wrote:
> >
> > > Is this something other people are seeing?  I was parsing 10k urls
> when I
> > > got this exception.  I'm running Nutch 2 head as of Aug 6 with the
> > default
> > > memory settings(1 GB).
> > >
> > > Just wondering if anybody else has experienced this on Nutch 2.
> > >
> > > Thanks.
> > >
> >
>

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Niccolò Becchi <ni...@gmail.com>.

Hi Ferdy,
When you get the "Out of memory error" if you have these opzions on the JVM:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/tmp
You get file on your filesystem with a heap dump at the instant of the
problem.

You can use http://www.eclipse.org/mat/ (an eclipse's extension) that is a
Java heap analyzer that helps you find memory leaks and reduce memory
consumption. It can be useful for you to understand better if the are
really problem.
However the first test to do (for me) is just to increase the memory and
try if so it works.

On Thu, Aug 9, 2012 at 9:07 AM, Ferdy Galema <fe...@kalooga.com>wrote:

> Hi,
>
> Of course setting a bigger heap sure helps, but most of the time only
> temporary. Can you see in the logs what type of documents are parsed?
>
> In case of html documents crawled on the wild web, a single document can
> cause the heap to explode. By default the cyberneko parser (in HtmlParser)
> is used for html documents. I hacked this library so that there are limits
> in the number of elements that are loaded during a parse. (I'm still trying
> to find a way to contribute this back into the codebase).
>
> Ferdy.
>
> On Wed, Aug 8, 2012 at 10:03 PM, Niccolò Becchi <niccolo.becchi@gmail.com
> >wrote:
>
> > If you are using Nutch in an hadoop cluster and you have enough memory
> try
> > with this parameters:
> >
> > <property>
> >     <name>mapred.child.java.opts</name>
> >     <value>-Xmx1600m -XX:-UseGCOverheadLimit
> > -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/tmp</value>
> > </property>
> >
> > On Wed, Aug 8, 2012 at 9:32 PM, Bai Shen <ba...@gmail.com>
> wrote:
> >
> > > Is this something other people are seeing?  I was parsing 10k urls
> when I
> > > got this exception.  I'm running Nutch 2 head as of Aug 6 with the
> > default
> > > memory settings(1 GB).
> > >
> > > Just wondering if anybody else has experienced this on Nutch 2.
> > >
> > > Thanks.
> > >
> >
>

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Ferdy Galema <fe...@kalooga.com>.

Hi,

Of course setting a bigger heap sure helps, but most of the time only
temporary. Can you see in the logs what type of documents are parsed?

In case of html documents crawled on the wild web, a single document can
cause the heap to explode. By default the cyberneko parser (in HtmlParser)
is used for html documents. I hacked this library so that there are limits
in the number of elements that are loaded during a parse. (I'm still trying
to find a way to contribute this back into the codebase).

Ferdy.

On Wed, Aug 8, 2012 at 10:03 PM, Niccolò Becchi <ni...@gmail.com>wrote:

> If you are using Nutch in an hadoop cluster and you have enough memory try
> with this parameters:
>
> <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx1600m -XX:-UseGCOverheadLimit
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/tmp</value>
> </property>
>
> On Wed, Aug 8, 2012 at 9:32 PM, Bai Shen <ba...@gmail.com> wrote:
>
> > Is this something other people are seeing?  I was parsing 10k urls when I
> > got this exception.  I'm running Nutch 2 head as of Aug 6 with the
> default
> > memory settings(1 GB).
> >
> > Just wondering if anybody else has experienced this on Nutch 2.
> >
> > Thanks.
> >
>

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Posted by Niccolò Becchi <ni...@gmail.com>.

If you are using Nutch in an hadoop cluster and you have enough memory try
with this parameters:

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx1600m -XX:-UseGCOverheadLimit
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/tmp</value>
</property>

On Wed, Aug 8, 2012 at 9:32 PM, Bai Shen <ba...@gmail.com> wrote:

> Is this something other people are seeing?  I was parsing 10k urls when I
> got this exception.  I'm running Nutch 2 head as of Aug 6 with the default
> memory settings(1 GB).
>
> Just wondering if anybody else has experienced this on Nutch 2.
>
> Thanks.
>