You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Sudip Datta <pi...@gmail.com> on 2011/11/01 20:54:11 UTC

Crawler stuck, crashes after fatal error in JRE

Hi,

My problem might not be suitable for the nutch mailing list but I
asked on java mailing lists but to no avail and wonder if someone here
has experienced the same.

I am trying to crawl several hosts using Nutch(1.4) and storing
content on Solr with one host per index(core). I had posted this
problem earlier at
http://lucene.472066.n3.nabble.com/Nutch-Crawl-to-Solr-with-separate-cores-for-hosts-td3447260.html
and could get SolrWriter to create host specific cores.

Unfortunately while this works for a sample crawl on local machine, it
gets stuck (and crashes with the JRE) on an EC2 instance (with JRE
version: 6.0_27-b07) producing an error dump posted at
http://paste.pocoo.org/show/501326/.

Has anybody faced similar problem or has clue about what might be
going wrong or what diagnostics to do? Please let me know if I can
provide any further information that might be useful.

Best regards,

--Sudip.

Re: Crawler stuck, crashes after fatal error in JRE

Posted by Sudip Datta <pi...@gmail.com>.

Hi Markus,

Thanks for the help. I also suspected it to be a memory issue but
after wrangling with the problem for days I just tried it on a
different EC2 instance, with same jdk and lo behold! it worked without
the slightest of problems.

I have no clue what was causing the trouble and this is frustrating
but glad that it finally did, albeit on a different machine.

Thanks,

--Sudip.

On Wed, Nov 2, 2011 at 2:14 AM, Markus Jelsma
<ma...@openindex.io> wrote:
> Hmm, it may also be a memory problem. You have both Nutch and Tomcat + Solr
> running on the same machine with limited RAM? 4GB allocated to Nutch and how
> much to Tomcat?
>
> Remeber that file descriptors take memory too, it adds up significantly if
> there are many. Both Tomcat + Solr and Nutch can open of a lot.
>
>
>
>
>> Are you using any non default or experimental JVM options? I've never seen
>> this happening anywhere with standard SUN JVM's.
>>
>> > Hi,
>> >
>> > My problem might not be suitable for the nutch mailing list but I
>> > asked on java mailing lists but to no avail and wonder if someone here
>> > has experienced the same.
>> >
>> > I am trying to crawl several hosts using Nutch(1.4) and storing
>> > content on Solr with one host per index(core). I had posted this
>> > problem earlier at
>> > http://lucene.472066.n3.nabble.com/Nutch-Crawl-to-Solr-with-separate-core
>> > s- for-hosts-td3447260.html and could get SolrWriter to create host
>> > specific cores.
>> >
>> > Unfortunately while this works for a sample crawl on local machine, it
>> > gets stuck (and crashes with the JRE) on an EC2 instance (with JRE
>> > version: 6.0_27-b07) producing an error dump posted at
>> > http://paste.pocoo.org/show/501326/.
>> >
>> > Has anybody faced similar problem or has clue about what might be
>> > going wrong or what diagnostics to do? Please let me know if I can
>> > provide any further information that might be useful.
>> >
>> > Best regards,
>> >
>> > --Sudip.
>

Re: Crawler stuck, crashes after fatal error in JRE

Posted by Markus Jelsma <ma...@openindex.io>.

Hmm, it may also be a memory problem. You have both Nutch and Tomcat + Solr 
running on the same machine with limited RAM? 4GB allocated to Nutch and how 
much to Tomcat?

Remeber that file descriptors take memory too, it adds up significantly if 
there are many. Both Tomcat + Solr and Nutch can open of a lot.




> Are you using any non default or experimental JVM options? I've never seen
> this happening anywhere with standard SUN JVM's.
> 
> > Hi,
> > 
> > My problem might not be suitable for the nutch mailing list but I
> > asked on java mailing lists but to no avail and wonder if someone here
> > has experienced the same.
> > 
> > I am trying to crawl several hosts using Nutch(1.4) and storing
> > content on Solr with one host per index(core). I had posted this
> > problem earlier at
> > http://lucene.472066.n3.nabble.com/Nutch-Crawl-to-Solr-with-separate-core
> > s- for-hosts-td3447260.html and could get SolrWriter to create host
> > specific cores.
> > 
> > Unfortunately while this works for a sample crawl on local machine, it
> > gets stuck (and crashes with the JRE) on an EC2 instance (with JRE
> > version: 6.0_27-b07) producing an error dump posted at
> > http://paste.pocoo.org/show/501326/.
> > 
> > Has anybody faced similar problem or has clue about what might be
> > going wrong or what diagnostics to do? Please let me know if I can
> > provide any further information that might be useful.
> > 
> > Best regards,
> > 
> > --Sudip.

Re: Crawler stuck, crashes after fatal error in JRE

Posted by Markus Jelsma <ma...@openindex.io>.

Sounds like a memory issue. Can you check my other reply in this thread?

> No. This is standard Sun (Oracle) JVM (Java version 1.6.0_27). I even
> tried with 1.6.0_24 but with same effect. Only the time it takes for
> the crawler to hang and jvm to crash varies. But then, it varies even
> between different runs.
> 
> On Wed, Nov 2, 2011 at 2:05 AM, Markus Jelsma
> 
> <ma...@openindex.io> wrote:
> > Are you using any non default or experimental JVM options? I've never
> > seen this happening anywhere with standard SUN JVM's.
> > 
> >> Hi,
> >> 
> >> My problem might not be suitable for the nutch mailing list but I
> >> asked on java mailing lists but to no avail and wonder if someone here
> >> has experienced the same.
> >> 
> >> I am trying to crawl several hosts using Nutch(1.4) and storing
> >> content on Solr with one host per index(core). I had posted this
> >> problem earlier at
> >> http://lucene.472066.n3.nabble.com/Nutch-Crawl-to-Solr-with-separate-cor
> >> es- for-hosts-td3447260.html and could get SolrWriter to create host
> >> specific cores.
> >> 
> >> Unfortunately while this works for a sample crawl on local machine, it
> >> gets stuck (and crashes with the JRE) on an EC2 instance (with JRE
> >> version: 6.0_27-b07) producing an error dump posted at
> >> http://paste.pocoo.org/show/501326/.
> >> 
> >> Has anybody faced similar problem or has clue about what might be
> >> going wrong or what diagnostics to do? Please let me know if I can
> >> provide any further information that might be useful.
> >> 
> >> Best regards,
> >> 
> >> --Sudip.

Re: Crawler stuck, crashes after fatal error in JRE

Posted by Sudip Datta <pi...@gmail.com>.

No. This is standard Sun (Oracle) JVM (Java version 1.6.0_27). I even
tried with 1.6.0_24 but with same effect. Only the time it takes for
the crawler to hang and jvm to crash varies. But then, it varies even
between different runs.

On Wed, Nov 2, 2011 at 2:05 AM, Markus Jelsma
<ma...@openindex.io> wrote:
> Are you using any non default or experimental JVM options? I've never seen
> this happening anywhere with standard SUN JVM's.
>
>> Hi,
>>
>> My problem might not be suitable for the nutch mailing list but I
>> asked on java mailing lists but to no avail and wonder if someone here
>> has experienced the same.
>>
>> I am trying to crawl several hosts using Nutch(1.4) and storing
>> content on Solr with one host per index(core). I had posted this
>> problem earlier at
>> http://lucene.472066.n3.nabble.com/Nutch-Crawl-to-Solr-with-separate-cores-
>> for-hosts-td3447260.html and could get SolrWriter to create host specific
>> cores.
>>
>> Unfortunately while this works for a sample crawl on local machine, it
>> gets stuck (and crashes with the JRE) on an EC2 instance (with JRE
>> version: 6.0_27-b07) producing an error dump posted at
>> http://paste.pocoo.org/show/501326/.
>>
>> Has anybody faced similar problem or has clue about what might be
>> going wrong or what diagnostics to do? Please let me know if I can
>> provide any further information that might be useful.
>>
>> Best regards,
>>
>> --Sudip.
>

Re: Crawler stuck, crashes after fatal error in JRE

Posted by Markus Jelsma <ma...@openindex.io>.

Are you using any non default or experimental JVM options? I've never seen 
this happening anywhere with standard SUN JVM's.

> Hi,
> 
> My problem might not be suitable for the nutch mailing list but I
> asked on java mailing lists but to no avail and wonder if someone here
> has experienced the same.
> 
> I am trying to crawl several hosts using Nutch(1.4) and storing
> content on Solr with one host per index(core). I had posted this
> problem earlier at
> http://lucene.472066.n3.nabble.com/Nutch-Crawl-to-Solr-with-separate-cores-
> for-hosts-td3447260.html and could get SolrWriter to create host specific
> cores.
> 
> Unfortunately while this works for a sample crawl on local machine, it
> gets stuck (and crashes with the JRE) on an EC2 instance (with JRE
> version: 6.0_27-b07) producing an error dump posted at
> http://paste.pocoo.org/show/501326/.
> 
> Has anybody faced similar problem or has clue about what might be
> going wrong or what diagnostics to do? Please let me know if I can
> provide any further information that might be useful.
> 
> Best regards,
> 
> --Sudip.