You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Hendrik Haddorp <he...@gmx.net> on 2016/10/21 12:56:09 UTC

solr shutdown

Hi,

I'm running solrcloud in foreground mode (-f). Does it make a difference 
for Solr if I stop it by pressing ctrl-c, sending it a SIGTERM or using 
"solr stop"?

regards,
Hendrik

Re: solr shutdown

Posted by Hendrik Haddorp <he...@gmx.net>.

thanks, I assume there is some issue on my side as I actually did not 
find any of the messages that the Solr script would log out during the 
shutdown. The shutdown also happened much faster then the 5 second delay 
in the script. So I'm doing something wrong. Anyhow, thanks for the 
further details, should give me enough to investigate further.

On 22.10.2016 15:22, Erick Erickson wrote:
> bq:  Would a clean shutdown result in the node to be flagged as down
> in the cluster state straight away?
>
> It should, if it's truly clean. HOWEVER..... a "clean shutdown" is
> unfortunately not just a "bin/solr stop" because of the timeout Shawn
> mentioned, see SOLR-9371. It's a simple edit to make it much longer,
> but the real fix should poll. The "smoking gun" would be a correlation
> between the node not being marked as down in state.json and a message
> when you stop the instance with bin/solr about "forcefully killing
> ....."
>
> After only 5 seconds, that script forcefully kills the instance of
> Solr which would _not_ flag the replicas it hosts as down. After an
> interval, you should see it disappear from the "live nodes" znode
> though. The problem of course is that part of graceful shutdown is
> each replica updating the associated state.json, and they don't get a
> chance. ZK will periodically ping the Solr instance and if it times
> out remove the associated znode in "live nodes"....
>
> Solr code checks both the state.json and live_nodes to know whether a
> node is truly functioning, being absent from live_nodes trumps
> whatever state is in state.json.
>
> Best,
> Erick
>
>
>
>
> On Sat, Oct 22, 2016 at 1:00 AM, Hendrik Haddorp
> <he...@gmx.net> wrote:
>> Thanks, that was what I was hoping for I just didn't see any indication for
>> that in the normal log output.
>>
>> The reason for asking is that I have a SolrCloud 6.2.1 setup and when ripple
>> restarting the nodes I sometimes get errors. So far I have seen two
>> different things:
>> 1) The node starts up again and is able to receive new replicas but all
>> existing replicas are broken.
>> 2) All nodes come up and no problems are seen in the cluster status but the
>> admin UI on one node claims that a file for one config set is missing.
>> Restarting the node resolves the issue.
>>
>> This looked to me like the node is not going down cleanly. Would a clean
>> shutdown result in the node to be flagged as down in the cluster state
>> straight away? So far the ZooKeeper data gets only updated once the node
>> comes up again and reports itself as down before the recovery starts.
>>
>> On 21.10.2016 15:01, Shawn Heisey wrote:
>>> On 10/21/2016 6:56 AM, Hendrik Haddorp wrote:
>>>> I'm running solrcloud in foreground mode (-f). Does it make a
>>>> difference for Solr if I stop it by pressing ctrl-c, sending it a
>>>> SIGTERM or using "solr stop"?
>>> All of those should produce the same result in the end -- Solr's
>>> shutdown hook will be called and a graceful shutdown will commence.
>>>
>>> Note that in the case of the "bin/solr stop" command, the default is to
>>> only wait five seconds for graceful shutdown before proceeding to a
>>> forced kill, which for a typical install, means that forced kills become
>>> the norm rather than the exception.  We have an issue to increase the
>>> max timeout, but it hasn't been done yet.
>>>
>>> I strongly recommend anyone going into production should edit the script
>>> to increase the timeout.  For the shell script I would do at least 60
>>> seconds.  The Windows script just does a pause, not an intelligent wait,
>>> so going that high probably isn't advisable on Windows.
>>>
>>> Thanks,
>>> Shawn
>>>

Re: solr shutdown

Posted by Erick Erickson <er...@gmail.com>.

bq:  Would a clean shutdown result in the node to be flagged as down
in the cluster state straight away?

It should, if it's truly clean. HOWEVER..... a "clean shutdown" is
unfortunately not just a "bin/solr stop" because of the timeout Shawn
mentioned, see SOLR-9371. It's a simple edit to make it much longer,
but the real fix should poll. The "smoking gun" would be a correlation
between the node not being marked as down in state.json and a message
when you stop the instance with bin/solr about "forcefully killing
....."

After only 5 seconds, that script forcefully kills the instance of
Solr which would _not_ flag the replicas it hosts as down. After an
interval, you should see it disappear from the "live nodes" znode
though. The problem of course is that part of graceful shutdown is
each replica updating the associated state.json, and they don't get a
chance. ZK will periodically ping the Solr instance and if it times
out remove the associated znode in "live nodes"....

Solr code checks both the state.json and live_nodes to know whether a
node is truly functioning, being absent from live_nodes trumps
whatever state is in state.json.

Best,
Erick

On Sat, Oct 22, 2016 at 1:00 AM, Hendrik Haddorp
<he...@gmx.net> wrote:
> Thanks, that was what I was hoping for I just didn't see any indication for
> that in the normal log output.
>
> The reason for asking is that I have a SolrCloud 6.2.1 setup and when ripple
> restarting the nodes I sometimes get errors. So far I have seen two
> different things:
> 1) The node starts up again and is able to receive new replicas but all
> existing replicas are broken.
> 2) All nodes come up and no problems are seen in the cluster status but the
> admin UI on one node claims that a file for one config set is missing.
> Restarting the node resolves the issue.
>
> This looked to me like the node is not going down cleanly. Would a clean
> shutdown result in the node to be flagged as down in the cluster state
> straight away? So far the ZooKeeper data gets only updated once the node
> comes up again and reports itself as down before the recovery starts.
>
> On 21.10.2016 15:01, Shawn Heisey wrote:
>>
>> On 10/21/2016 6:56 AM, Hendrik Haddorp wrote:
>>>
>>> I'm running solrcloud in foreground mode (-f). Does it make a
>>> difference for Solr if I stop it by pressing ctrl-c, sending it a
>>> SIGTERM or using "solr stop"?
>>
>> All of those should produce the same result in the end -- Solr's
>> shutdown hook will be called and a graceful shutdown will commence.
>>
>> Note that in the case of the "bin/solr stop" command, the default is to
>> only wait five seconds for graceful shutdown before proceeding to a
>> forced kill, which for a typical install, means that forced kills become
>> the norm rather than the exception.  We have an issue to increase the
>> max timeout, but it hasn't been done yet.
>>
>> I strongly recommend anyone going into production should edit the script
>> to increase the timeout.  For the shell script I would do at least 60
>> seconds.  The Windows script just does a pause, not an intelligent wait,
>> so going that high probably isn't advisable on Windows.
>>
>> Thanks,
>> Shawn
>>
>

Re: solr shutdown

Posted by Hendrik Haddorp <he...@gmx.net>.

Thanks, that was what I was hoping for I just didn't see any indication 
for that in the normal log output.

The reason for asking is that I have a SolrCloud 6.2.1 setup and when 
ripple restarting the nodes I sometimes get errors. So far I have seen 
two different things:
1) The node starts up again and is able to receive new replicas but all 
existing replicas are broken.
2) All nodes come up and no problems are seen in the cluster status but 
the admin UI on one node claims that a file for one config set is 
missing. Restarting the node resolves the issue.

This looked to me like the node is not going down cleanly. Would a clean 
shutdown result in the node to be flagged as down in the cluster state 
straight away? So far the ZooKeeper data gets only updated once the node 
comes up again and reports itself as down before the recovery starts.

On 21.10.2016 15:01, Shawn Heisey wrote:
> On 10/21/2016 6:56 AM, Hendrik Haddorp wrote:
>> I'm running solrcloud in foreground mode (-f). Does it make a
>> difference for Solr if I stop it by pressing ctrl-c, sending it a
>> SIGTERM or using "solr stop"?
> All of those should produce the same result in the end -- Solr's
> shutdown hook will be called and a graceful shutdown will commence.
>
> Note that in the case of the "bin/solr stop" command, the default is to
> only wait five seconds for graceful shutdown before proceeding to a
> forced kill, which for a typical install, means that forced kills become
> the norm rather than the exception.  We have an issue to increase the
> max timeout, but it hasn't been done yet.
>
> I strongly recommend anyone going into production should edit the script
> to increase the timeout.  For the shell script I would do at least 60
> seconds.  The Windows script just does a pause, not an intelligent wait,
> so going that high probably isn't advisable on Windows.
>
> Thanks,
> Shawn
>

Re: solr shutdown

Posted by Mark Miller <ma...@gmail.com>.

That is probably partly because of hdfs cache key unmapping. I think I
improved that in some issue at some point.

We really want to wait by default for a long time though - even 10 minutes
or more. If you have tons of SolrCores, each of them has to be torn down,
each of them might commit on close, custom code and resources can be used
and need to be released, and a lot of time can be spent legit. Given these
long shutdowns will normally be legit and not some hang, I think we want to
be willing to wait a long time. A user that finds this too long can always
kill the process themselves, or lower the wait. But most of the time you
will pay for that for a non clean shutdown except in exceptional situations.

- Mark

On Fri, Oct 21, 2016 at 12:10 PM Joe Obernberger <
joseph.obernberger@gmail.com> wrote:

> Thanks Shawn - We've had to increase this to 300 seconds when using a
> large cache size with HDFS, and a fairly heavily loaded index routine (3
> million docs per day).  I don't know if that's why it takes a long time
> to shutdown, but it can take a while for solr cloud to shutdown
> gracefully.  If it does not, you end up with write.lock files for some
> (if not all) of the shards, and have to delete them manually before
> restarting.
>
> -Joe
>
>
> On 10/21/2016 9:01 AM, Shawn Heisey wrote:
> > On 10/21/2016 6:56 AM, Hendrik Haddorp wrote:
> >> I'm running solrcloud in foreground mode (-f). Does it make a
> >> difference for Solr if I stop it by pressing ctrl-c, sending it a
> >> SIGTERM or using "solr stop"?
> > All of those should produce the same result in the end -- Solr's
> > shutdown hook will be called and a graceful shutdown will commence.
> >
> > Note that in the case of the "bin/solr stop" command, the default is to
> > only wait five seconds for graceful shutdown before proceeding to a
> > forced kill, which for a typical install, means that forced kills become
> > the norm rather than the exception.  We have an issue to increase the
> > max timeout, but it hasn't been done yet.
> >
> > I strongly recommend anyone going into production should edit the script
> > to increase the timeout.  For the shell script I would do at least 60
> > seconds.  The Windows script just does a pause, not an intelligent wait,
> > so going that high probably isn't advisable on Windows.
> >
> > Thanks,
> > Shawn
> >
>
> --
- Mark
about.me/markrmiller

Re: solr shutdown

Posted by Joe Obernberger <jo...@gmail.com>.

Thanks Shawn - We've had to increase this to 300 seconds when using a 
large cache size with HDFS, and a fairly heavily loaded index routine (3 
million docs per day).  I don't know if that's why it takes a long time 
to shutdown, but it can take a while for solr cloud to shutdown 
gracefully.  If it does not, you end up with write.lock files for some 
(if not all) of the shards, and have to delete them manually before 
restarting.

-Joe


On 10/21/2016 9:01 AM, Shawn Heisey wrote:
> On 10/21/2016 6:56 AM, Hendrik Haddorp wrote:
>> I'm running solrcloud in foreground mode (-f). Does it make a
>> difference for Solr if I stop it by pressing ctrl-c, sending it a
>> SIGTERM or using "solr stop"?
> All of those should produce the same result in the end -- Solr's
> shutdown hook will be called and a graceful shutdown will commence.
>
> Note that in the case of the "bin/solr stop" command, the default is to
> only wait five seconds for graceful shutdown before proceeding to a
> forced kill, which for a typical install, means that forced kills become
> the norm rather than the exception.  We have an issue to increase the
> max timeout, but it hasn't been done yet.
>
> I strongly recommend anyone going into production should edit the script
> to increase the timeout.  For the shell script I would do at least 60
> seconds.  The Windows script just does a pause, not an intelligent wait,
> so going that high probably isn't advisable on Windows.
>
> Thanks,
> Shawn
>

Re: solr shutdown

Posted by Shawn Heisey <ap...@elyograg.org>.

On 10/21/2016 6:56 AM, Hendrik Haddorp wrote:
> I'm running solrcloud in foreground mode (-f). Does it make a
> difference for Solr if I stop it by pressing ctrl-c, sending it a
> SIGTERM or using "solr stop"? 

All of those should produce the same result in the end -- Solr's
shutdown hook will be called and a graceful shutdown will commence.

Note that in the case of the "bin/solr stop" command, the default is to
only wait five seconds for graceful shutdown before proceeding to a
forced kill, which for a typical install, means that forced kills become
the norm rather than the exception.  We have an issue to increase the
max timeout, but it hasn't been done yet.

I strongly recommend anyone going into production should edit the script
to increase the timeout.  For the shell script I would do at least 60
seconds.  The Windows script just does a pause, not an intelligent wait,
so going that high probably isn't advisable on Windows.

Thanks,
Shawn