You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Pablo Anzorena <an...@gmail.com> on 2016/06/07 12:08:21 UTC

SolrCloud SolrNode stopping randomly for no reason

Hey,

I'am using SolrCloud with two nodes (5.2.1). One or two times a day the
node1 is stopping for no reason. I checked the logs but no errors are beign
logged.
I also have a standalone solr service in both nodes running in production
(we are doing the migration to SolrCloud, that's why).

Thanks.

Re: SolrCloud SolrNode stopping randomly for no reason

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/7/2016 9:55 AM, Pablo Anzorena wrote:
> Sorry for the poor details, but I didn't post the log files because
> there was nothing out of the ordinary in the solr.log file, neither in
> the solr-8984-console.log, nor in solr_gc.log. What log do you want me
> to show you? solr.log.1 (which I think it should be the last one) for
> example? You need the tail or the head of the file? When I say
> "stopping for no reason" I mean the service is not running anymore,
> the process is finished. I tried killing it with kill -9 command and
> it does not log that, my first thought was that I restarted the
> standalone solr service which try to stop the service and if it can't
> it kills it doing SOLR_PROCESS_ID="`ps -eaf | grep -v "grep" | grep
> "start.jar";kill -9 ${SOLR_PROCESS_ID}. So sometimes it could kill
> solrcloud instead of standalone, but sometimes the datetime does not
> match. Another option is that it gives an outofmemoryerror and the oom
> script is killing the process, but again I saw nothing in the solr_gc.log.

I'm pretty sure that nothing would get logged in the gc log for an
OutOfMemoryError.  It might show up in solr.log (or one of the rotated
or renamed solr.log files), but depending on exactly what code throws
the OOME, it's also possible that the actual exception won't be logged
at all.

The bin/solr script on 5.2.1 uses the OOM killer option incorrectly --
so it doesn't even work.  If you fix the commandline to make it work,
then it would create a solr_oom_killer_STUFF logfile.

I would strongly recommend editing bin/solr to increase the "waiting to
start/die" timeout from 5 seconds to 30-60 seconds, especially if you
are running more than one Solr or Jetty process on the machine.  It
might also be a good idea to have an issue requesting a change in how
the script figures out which process gets the "kill -9" signal.

Thanks,
Shawn


Re: SolrCloud SolrNode stopping randomly for no reason

Posted by Pablo Anzorena <an...@gmail.com>.
Sorry for the poor details, but I didn't post the log files because there
was nothing out of the ordinary in the solr.log file, neither in
the solr-8984-console.log, nor in solr_gc.log.

What log do you want me to show you? solr.log.1 (which I think it should be
the last one) for example? You need the tail or the head of the file?

When I say "stopping for no reason" I mean the service is not running
anymore, the process is finished. I tried killing it with kill -9 command
and it does not log that, my first thought was that I restarted the
standalone solr service which try to stop the service and if it can't it
kills it doing SOLR_PROCESS_ID="`ps -eaf | grep -v "grep" | grep
"start.jar";kill -9 ${SOLR_PROCESS_ID}. So sometimes it could kill
solrcloud instead of standalone, but sometimes the datetime does not match.
Another option is that it gives an outofmemoryerror and the oom script is
killing the process, but again I saw nothing in the solr_gc.log.

2016-06-07 10:18 GMT-03:00 Shawn Heisey <ap...@elyograg.org>:

> On 6/7/2016 6:08 AM, Pablo Anzorena wrote:
> > I'am using SolrCloud with two nodes (5.2.1). One or two times a day the
> > node1 is stopping for no reason. I checked the logs but no errors are
> beign
> > logged.
> > I also have a standalone solr service in both nodes running in production
> > (we are doing the migration to SolrCloud, that's why).
>
> https://wiki.apache.org/solr/UsingMailingLists
>
> There are no real details to your message.  What precisely does
> "stopping for no reason" mean?  What does Solr *do*?  We cannot see your
> system, you must tell us what is happening with considerable detail.
>
> It seems highly unlikely that Solr would misbehave without logging
> *something*.  Are you looking at the Logging tab in the admin UI, or the
> actual solr.log file?  The solr.log file is the only reliable place to
> look.  When you restart Solr, the current logfile is renamed and a new
> solr.log will be created.
>
> Thanks,
> Shawn
>
>

Re: SolrCloud SolrNode stopping randomly for no reason

Posted by Shawn Heisey <ap...@elyograg.org>.
On 6/7/2016 6:08 AM, Pablo Anzorena wrote:
> I'am using SolrCloud with two nodes (5.2.1). One or two times a day the
> node1 is stopping for no reason. I checked the logs but no errors are beign
> logged.
> I also have a standalone solr service in both nodes running in production
> (we are doing the migration to SolrCloud, that's why).

https://wiki.apache.org/solr/UsingMailingLists

There are no real details to your message.  What precisely does
"stopping for no reason" mean?  What does Solr *do*?  We cannot see your
system, you must tell us what is happening with considerable detail.

It seems highly unlikely that Solr would misbehave without logging
*something*.  Are you looking at the Logging tab in the admin UI, or the
actual solr.log file?  The solr.log file is the only reliable place to
look.  When you restart Solr, the current logfile is renamed and a new
solr.log will be created.

Thanks,
Shawn