You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Doss <it...@gmail.com> on 2014/11/19 10:04:52 UTC

SOLR not starting after restart 2 node cloud setup

I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times
SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat in
Node 1, but SOLR not starting up, but if I remove the solr cores in both
nodes and try restarting it starts working, and then I have to reindex the
whole data again. We are using this setup in production because of this
issue we are having 1 to 1.30 hours of service down time. Any suggestions
would be greatly appreciated.

Thanks,
Doss.

Re: SOLR not starting after restart 2 node cloud setup

Posted by Erick Erickson <er...@gmail.com>.
Glad you found a solution!

Best,
Erick

On Tue, Dec 2, 2014 at 4:30 AM, Doss <it...@gmail.com> wrote:
> Dear Erick,
>
> Thanks for your thoughts, it helped me a lot. In my instances no solr logs
> are appended in to catalina.out.
>
> Now I placed the log4j.properties file. Solr logs are captured in solr.log
> file with the help of it I found the reason for the issue.
>
> I am starting tomcat with the option -Dbootstrap_conf=true which made solr
> to look for core configuration files in a wrong directory, after removing
> this it started without any issues.
>
> I also commented suggester component which made solr to load fast.
>
> Thanks,
> Doss.
>
>
>
>
> On Thu, Nov 20, 2014 at 9:47 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Doss:
>>
>> Tomcat often puts things in "catalina.out", you might check there,
>> I've often seen logging information from Solr go there by
>> default.
>>
>> Without having some idea what kinds of problems Solr is
>> reporting when you see this situation, it's really hard to say.
>>
>> Some things I'd check first though, in order of what
>> I _guess_ is most likely.
>>
>> > There have been anecdotal reports (in fact, I'm trying
>> to understand the why of it right now) of the suggester
>> taking a long time to initialize, even if you don't use it!
>> So if you're not using the suggest component, try
>> commenting out those sections in solrconfig.xml for
>> the cores in question. I like this explanation since it
>> fits with your symptoms, but I don't like it since the
>> index you are using isn't all that big. So it's something
>> of a shot in the dark. I expect that the core will
>> _eventually_ come up, but I've seen reports of 10-15
>> minutes being required, far beyond my patience! That
>> said, this would also explain why deleting the index
>> works.
>>
>> > OutOfMemory errors. You might be able to attach
>> jConsole (part of the standard Java stuff) to the process
>> and monitor the memory usage. If it's being pushed near
>> the 5G limit that's the first thing I'd suspect.
>>
>> > If you're using the default setups, then the Zookeeper
>> timeout may be too low, I think the default (not sure about
>> whether it's been changed in 4.9) is 15 seconds, 30-60
>> is usually much better.
>>
>> Best,
>> Erick
>>
>>
>> On Thu, Nov 20, 2014 at 3:47 AM, Doss <it...@gmail.com> wrote:
>> > Dear Erick,
>> >
>> > Forgive my ignorance.
>> >
>> > Please find some of the details you required.
>> >
>> > *have you looked at the solr logs?*
>> >
>> >  > Sorry I haven't defined the log4j.properties file, so I don't have
>> solr
>> > logs. Since it requires tomcat restart I am planning to do it in next
>> > restart.
>> >
>> > But found the following in tomcat log
>> >
>> > 18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2]
>> > org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The
>> web
>> > application [/mima] appears to have started a thread named
>> > [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to
>> > stop it. This is very likely to create a memory leak. Stack trace of
>> thread:
>> >  sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>> >  sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>> >  sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
>> >  sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>> >  sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>> >
>> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
>> >  org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>> >
>> >
>> > *How big are the cores?*
>> >
>> >> We have 16 cores, out of it only 5 are big ones. Total size of all 16
>> > cores is 10+ GB
>> >
>> > *How many docs in the cores when the problem happens?*
>> >
>> > 1 core with 163 fields and 33,00,000 documents (Index size 2+ GB)
>> >  4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5
>> GB)
>> > remaining cores are 1,00,000 to 40,00,000 documents
>> >
>> > *How much memory are you allocating the JVM? *
>> >
>> > 5GB for JVM, Total RAM available in the systems is 30 GB
>> >
>> > *can you restart Tomcat without a problem?*
>> >
>> > This problem is occurring in production, I never tried.
>> >
>> >
>> > Thanks,
>> > Doss.
>> >
>> >
>> > On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson <erickerickson@gmail.com
>> >
>> > wrote:
>> >
>> >> You've really got to provide details for us to say much
>> >> of anything. There are about a zillion things that it could be.
>> >>
>> >> In particular, have you looked at the solr logs? Are there
>> >> any interesting things in them? How big are the cores?
>> >> How much memory are you allocating the JVM? How
>> >> many docs in the cores when the problem happens?
>> >> Before the nodes stop responding, can you restart
>> >> Tomcat without a problem?
>> >>
>> >> You might review:
>> >> http://wiki.apache.org/solr/UsingMailingLists
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >>
>> >> On Wed, Nov 19, 2014 at 1:04 AM, Doss <it...@gmail.com> wrote:
>> >> > I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At
>> times
>> >> > SOLR in Node 1 stops responding, to fix the issue I am restarting
>> tomcat
>> >> in
>> >> > Node 1, but SOLR not starting up, but if I remove the solr cores in
>> both
>> >> > nodes and try restarting it starts working, and then I have to reindex
>> >> the
>> >> > whole data again. We are using this setup in production because of
>> this
>> >> > issue we are having 1 to 1.30 hours of service down time. Any
>> suggestions
>> >> > would be greatly appreciated.
>> >> >
>> >> > Thanks,
>> >> > Doss.
>> >>
>>

Re: SOLR not starting after restart 2 node cloud setup

Posted by Doss <it...@gmail.com>.
Dear Erick,

Thanks for your thoughts, it helped me a lot. In my instances no solr logs
are appended in to catalina.out.

Now I placed the log4j.properties file. Solr logs are captured in solr.log
file with the help of it I found the reason for the issue.

I am starting tomcat with the option -Dbootstrap_conf=true which made solr
to look for core configuration files in a wrong directory, after removing
this it started without any issues.

I also commented suggester component which made solr to load fast.

Thanks,
Doss.




On Thu, Nov 20, 2014 at 9:47 PM, Erick Erickson <er...@gmail.com>
wrote:

> Doss:
>
> Tomcat often puts things in "catalina.out", you might check there,
> I've often seen logging information from Solr go there by
> default.
>
> Without having some idea what kinds of problems Solr is
> reporting when you see this situation, it's really hard to say.
>
> Some things I'd check first though, in order of what
> I _guess_ is most likely.
>
> > There have been anecdotal reports (in fact, I'm trying
> to understand the why of it right now) of the suggester
> taking a long time to initialize, even if you don't use it!
> So if you're not using the suggest component, try
> commenting out those sections in solrconfig.xml for
> the cores in question. I like this explanation since it
> fits with your symptoms, but I don't like it since the
> index you are using isn't all that big. So it's something
> of a shot in the dark. I expect that the core will
> _eventually_ come up, but I've seen reports of 10-15
> minutes being required, far beyond my patience! That
> said, this would also explain why deleting the index
> works.
>
> > OutOfMemory errors. You might be able to attach
> jConsole (part of the standard Java stuff) to the process
> and monitor the memory usage. If it's being pushed near
> the 5G limit that's the first thing I'd suspect.
>
> > If you're using the default setups, then the Zookeeper
> timeout may be too low, I think the default (not sure about
> whether it's been changed in 4.9) is 15 seconds, 30-60
> is usually much better.
>
> Best,
> Erick
>
>
> On Thu, Nov 20, 2014 at 3:47 AM, Doss <it...@gmail.com> wrote:
> > Dear Erick,
> >
> > Forgive my ignorance.
> >
> > Please find some of the details you required.
> >
> > *have you looked at the solr logs?*
> >
> >  > Sorry I haven't defined the log4j.properties file, so I don't have
> solr
> > logs. Since it requires tomcat restart I am planning to do it in next
> > restart.
> >
> > But found the following in tomcat log
> >
> > 18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2]
> > org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The
> web
> > application [/mima] appears to have started a thread named
> > [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to
> > stop it. This is very likely to create a memory leak. Stack trace of
> thread:
> >  sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
> >  sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> >  sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
> >  sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
> >  sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
> >
> org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
> >  org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
> >
> >
> > *How big are the cores?*
> >
> >> We have 16 cores, out of it only 5 are big ones. Total size of all 16
> > cores is 10+ GB
> >
> > *How many docs in the cores when the problem happens?*
> >
> > 1 core with 163 fields and 33,00,000 documents (Index size 2+ GB)
> >  4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5
> GB)
> > remaining cores are 1,00,000 to 40,00,000 documents
> >
> > *How much memory are you allocating the JVM? *
> >
> > 5GB for JVM, Total RAM available in the systems is 30 GB
> >
> > *can you restart Tomcat without a problem?*
> >
> > This problem is occurring in production, I never tried.
> >
> >
> > Thanks,
> > Doss.
> >
> >
> > On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> >> You've really got to provide details for us to say much
> >> of anything. There are about a zillion things that it could be.
> >>
> >> In particular, have you looked at the solr logs? Are there
> >> any interesting things in them? How big are the cores?
> >> How much memory are you allocating the JVM? How
> >> many docs in the cores when the problem happens?
> >> Before the nodes stop responding, can you restart
> >> Tomcat without a problem?
> >>
> >> You might review:
> >> http://wiki.apache.org/solr/UsingMailingLists
> >>
> >> Best,
> >> Erick
> >>
> >>
> >> On Wed, Nov 19, 2014 at 1:04 AM, Doss <it...@gmail.com> wrote:
> >> > I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At
> times
> >> > SOLR in Node 1 stops responding, to fix the issue I am restarting
> tomcat
> >> in
> >> > Node 1, but SOLR not starting up, but if I remove the solr cores in
> both
> >> > nodes and try restarting it starts working, and then I have to reindex
> >> the
> >> > whole data again. We are using this setup in production because of
> this
> >> > issue we are having 1 to 1.30 hours of service down time. Any
> suggestions
> >> > would be greatly appreciated.
> >> >
> >> > Thanks,
> >> > Doss.
> >>
>

Re: SOLR not starting after restart 2 node cloud setup

Posted by Erick Erickson <er...@gmail.com>.
Doss:

Tomcat often puts things in "catalina.out", you might check there,
I've often seen logging information from Solr go there by
default.

Without having some idea what kinds of problems Solr is
reporting when you see this situation, it's really hard to say.

Some things I'd check first though, in order of what
I _guess_ is most likely.

> There have been anecdotal reports (in fact, I'm trying
to understand the why of it right now) of the suggester
taking a long time to initialize, even if you don't use it!
So if you're not using the suggest component, try
commenting out those sections in solrconfig.xml for
the cores in question. I like this explanation since it
fits with your symptoms, but I don't like it since the
index you are using isn't all that big. So it's something
of a shot in the dark. I expect that the core will
_eventually_ come up, but I've seen reports of 10-15
minutes being required, far beyond my patience! That
said, this would also explain why deleting the index
works.

> OutOfMemory errors. You might be able to attach
jConsole (part of the standard Java stuff) to the process
and monitor the memory usage. If it's being pushed near
the 5G limit that's the first thing I'd suspect.

> If you're using the default setups, then the Zookeeper
timeout may be too low, I think the default (not sure about
whether it's been changed in 4.9) is 15 seconds, 30-60
is usually much better.

Best,
Erick


On Thu, Nov 20, 2014 at 3:47 AM, Doss <it...@gmail.com> wrote:
> Dear Erick,
>
> Forgive my ignorance.
>
> Please find some of the details you required.
>
> *have you looked at the solr logs?*
>
>  > Sorry I haven't defined the log4j.properties file, so I don't have solr
> logs. Since it requires tomcat restart I am planning to do it in next
> restart.
>
> But found the following in tomcat log
>
> 18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2]
> org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web
> application [/mima] appears to have started a thread named
> [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to
> stop it. This is very likely to create a memory leak. Stack trace of thread:
>  sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>  sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>  sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
>  sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>  sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>  org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
>  org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
>
>
> *How big are the cores?*
>
>> We have 16 cores, out of it only 5 are big ones. Total size of all 16
> cores is 10+ GB
>
> *How many docs in the cores when the problem happens?*
>
> 1 core with 163 fields and 33,00,000 documents (Index size 2+ GB)
>  4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5 GB)
> remaining cores are 1,00,000 to 40,00,000 documents
>
> *How much memory are you allocating the JVM? *
>
> 5GB for JVM, Total RAM available in the systems is 30 GB
>
> *can you restart Tomcat without a problem?*
>
> This problem is occurring in production, I never tried.
>
>
> Thanks,
> Doss.
>
>
> On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> You've really got to provide details for us to say much
>> of anything. There are about a zillion things that it could be.
>>
>> In particular, have you looked at the solr logs? Are there
>> any interesting things in them? How big are the cores?
>> How much memory are you allocating the JVM? How
>> many docs in the cores when the problem happens?
>> Before the nodes stop responding, can you restart
>> Tomcat without a problem?
>>
>> You might review:
>> http://wiki.apache.org/solr/UsingMailingLists
>>
>> Best,
>> Erick
>>
>>
>> On Wed, Nov 19, 2014 at 1:04 AM, Doss <it...@gmail.com> wrote:
>> > I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times
>> > SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat
>> in
>> > Node 1, but SOLR not starting up, but if I remove the solr cores in both
>> > nodes and try restarting it starts working, and then I have to reindex
>> the
>> > whole data again. We are using this setup in production because of this
>> > issue we are having 1 to 1.30 hours of service down time. Any suggestions
>> > would be greatly appreciated.
>> >
>> > Thanks,
>> > Doss.
>>

Re: SOLR not starting after restart 2 node cloud setup

Posted by Doss <it...@gmail.com>.
Dear Erick,

Forgive my ignorance.

Please find some of the details you required.

*have you looked at the solr logs?*

 > Sorry I haven't defined the log4j.properties file, so I don't have solr
logs. Since it requires tomcat restart I am planning to do it in next
restart.

But found the following in tomcat log

18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2]
org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web
application [/mima] appears to have started a thread named
[localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to
stop it. This is very likely to create a memory leak. Stack trace of thread:
 sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
 sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
 sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
 sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
 sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
 org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349)
 org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)


*How big are the cores?*

> We have 16 cores, out of it only 5 are big ones. Total size of all 16
cores is 10+ GB

*How many docs in the cores when the problem happens?*

1 core with 163 fields and 33,00,000 documents (Index size 2+ GB)
 4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5 GB)
remaining cores are 1,00,000 to 40,00,000 documents

*How much memory are you allocating the JVM? *

5GB for JVM, Total RAM available in the systems is 30 GB

*can you restart Tomcat without a problem?*

This problem is occurring in production, I never tried.


Thanks,
Doss.


On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson <er...@gmail.com>
wrote:

> You've really got to provide details for us to say much
> of anything. There are about a zillion things that it could be.
>
> In particular, have you looked at the solr logs? Are there
> any interesting things in them? How big are the cores?
> How much memory are you allocating the JVM? How
> many docs in the cores when the problem happens?
> Before the nodes stop responding, can you restart
> Tomcat without a problem?
>
> You might review:
> http://wiki.apache.org/solr/UsingMailingLists
>
> Best,
> Erick
>
>
> On Wed, Nov 19, 2014 at 1:04 AM, Doss <it...@gmail.com> wrote:
> > I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times
> > SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat
> in
> > Node 1, but SOLR not starting up, but if I remove the solr cores in both
> > nodes and try restarting it starts working, and then I have to reindex
> the
> > whole data again. We are using this setup in production because of this
> > issue we are having 1 to 1.30 hours of service down time. Any suggestions
> > would be greatly appreciated.
> >
> > Thanks,
> > Doss.
>

Re: SOLR not starting after restart 2 node cloud setup

Posted by Erick Erickson <er...@gmail.com>.
You've really got to provide details for us to say much
of anything. There are about a zillion things that it could be.

In particular, have you looked at the solr logs? Are there
any interesting things in them? How big are the cores?
How much memory are you allocating the JVM? How
many docs in the cores when the problem happens?
Before the nodes stop responding, can you restart
Tomcat without a problem?

You might review:
http://wiki.apache.org/solr/UsingMailingLists

Best,
Erick


On Wed, Nov 19, 2014 at 1:04 AM, Doss <it...@gmail.com> wrote:
> I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times
> SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat in
> Node 1, but SOLR not starting up, but if I remove the solr cores in both
> nodes and try restarting it starts working, and then I have to reindex the
> whole data again. We are using this setup in production because of this
> issue we are having 1 to 1.30 hours of service down time. Any suggestions
> would be greatly appreciated.
>
> Thanks,
> Doss.