You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Michael <so...@gmail.com> on 2009/11/12 18:17:18 UTC

Stop solr without losing documents

I've got a process external to Solr that is constantly feeding it new
documents, retrying if Solr is nonresponding.  What's the right way to
stop Solr (running in Tomcat) so no documents are lost?

Currently I'm committing all cores and then running catalina's stop
script, but between my commit and the stop, more documents can come in
that would need *another* commit...

Lots of people must have had this problem already, so I know the
answer is simple; I just can't find it!

Thanks.
Michael

Re: Stop solr without losing documents

Posted by Michael <so...@gmail.com>.
On Fri, Nov 13, 2009 at 4:09 PM, Chris Hostetter
<ho...@fucit.org> wrote:
> please don't kill -9 ... it's grossly overkill, and doesn't give your
[ ... snip ... ]
> Alternately, you could take advantage of the "enabled" feature from your
> client (just have it test the enabled url ever N updates or so) and when
> it sees that you have disabled the port it can send one last commit and
> then stop sending updates until it sees the enabled URL work againg -- as
> soon as you see the updates stop, you can safely shutdown hte port.

Thanks, Hoss.  I'll use Catalina stop instead of kill -9.

It's good to know about the enabled feature -- my team was just
discussing whether something like that existed that we could use --
but as we'd also like to recover cleanly from power failures and other
Solr terminations, I think we'll track which docs are uncommitted
outside of Solr.

Michael

Re: Stop solr without losing documents

Posted by Michael <so...@gmail.com>.
On Fri, Nov 13, 2009 at 11:02 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> So I think the question is really:
> "If I stop the servlet container, does Solr issue a commit in the shutdown hook in order to ensure all buffered docs are persisted to disk before the JVM exits".

Exactly right, Otis.

> I don't have the Solr source handy, but if I did, I'd look for "Shutdown", "Hook" and "finalize" in the code.

Thanks for the direction.  There was some talk of close()ing a
SolrCore that I found, but I don't believe this meant a commit.

I somehow hadn't thought of actually *trying* to add a doc and then
shut down a Solr instance; shame on me.  Unfortunately, when I test
this via
 * make a new solr
 * add a doc
 * commit
 * verify it shows up in a search -- it does
 * add a 2nd doc
 * shutdown
solr doesn't stop.  It stops accepting connections, but java refuses
to actually die.  Not sure what we're doing wrong on our end, but I
see this frequently and end up having to do a kill (usually not -9!).
I guess we'll stick with externally tracking which docs have
committed, so that when we inevitably have to kill Solr it doesn't
cause a problem.

Michael

Re: Stop solr without losing documents

Posted by Michael <so...@gmail.com>.
On Fri, Nov 13, 2009 at 11:45 PM, Lance Norskog <go...@gmail.com> wrote:
> I would go with polling Solr to find what is not yet there. In
> production, it is better to assume that things will break, and have
> backstop janitors that fix them. And then test those janitors
> regularly.

Good idea, Lance.  I certainly agree with the idea of backstop
janitors.  We don't have a good way of polling Solr for what's in
there or not -- we have a kind of asynchronous, multithreaded updating
system sending docs to Solr -- but we always can find out *externally*
which docs have been committed or not.

Michael

Re: Stop solr without losing documents

Posted by Lance Norskog <go...@gmail.com>.
I would go with polling Solr to find what is not yet there. In
production, it is better to assume that things will break, and have
backstop janitors that fix them. And then test those janitors
regularly.

On Fri, Nov 13, 2009 at 8:02 PM, Otis Gospodnetic
<ot...@yahoo.com> wrote:
> So I think the question is really:
> "If I stop the servlet container, does Solr issue a commit in the shutdown hook in order to ensure all buffered docs are persisted to disk before the JVM exits".
>
> I don't have the Solr source handy, but if I did, I'd look for "Shutdown", "Hook" and "finalize" in the code.
>
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> ----- Original Message ----
>> From: Chris Hostetter <ho...@fucit.org>
>> To: solr-user@lucene.apache.org
>> Sent: Fri, November 13, 2009 4:09:00 PM
>> Subject: Re: Stop solr without losing documents
>>
>>
>> : which documents have been updated before a successful commit.  Now
>> : stopping solr is as easy as kill -9.
>>
>> please don't kill -9 ... it's grossly overkill, and doesn't give your
>> servlet container a fair chance to cleanthings up.  A lot of work has been
>> done to make Lucene indexes robust to hard terminations of the JVM (or
>> physical machine) but there's no reason to go out of your way to try and
>> stab it in the heart when you could just shut it down cleanly.
>>
>> that's not to say your appraoch isn't a good one -- if you only have one
>> client sending updates/commits then having it keep track of what was
>> indexed prior to the lasts successful commit is a viable way to dela with
>> what happens if solr stops responding (either because you shut it down, or
>> because it crashed for some other reason).
>>
>> Alternately, you could take advantage of the "enabled" feature from your
>> client (just have it test the enabled url ever N updates or so) and when
>> it sees that you have disabled the port it can send one last commit and
>> then stop sending updates until it sees the enabled URL work againg -- as
>> soon as you see the updates stop, you can safely shutdown hte port.
>>
>>
>> -Hoss
>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Stop solr without losing documents

Posted by Otis Gospodnetic <ot...@yahoo.com>.
So I think the question is really:
"If I stop the servlet container, does Solr issue a commit in the shutdown hook in order to ensure all buffered docs are persisted to disk before the JVM exits".

I don't have the Solr source handy, but if I did, I'd look for "Shutdown", "Hook" and "finalize" in the code.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Chris Hostetter <ho...@fucit.org>
> To: solr-user@lucene.apache.org
> Sent: Fri, November 13, 2009 4:09:00 PM
> Subject: Re: Stop solr without losing documents
> 
> 
> : which documents have been updated before a successful commit.  Now
> : stopping solr is as easy as kill -9.
> 
> please don't kill -9 ... it's grossly overkill, and doesn't give your 
> servlet container a fair chance to cleanthings up.  A lot of work has been 
> done to make Lucene indexes robust to hard terminations of the JVM (or 
> physical machine) but there's no reason to go out of your way to try and 
> stab it in the heart when you could just shut it down cleanly.
> 
> that's not to say your appraoch isn't a good one -- if you only have one 
> client sending updates/commits then having it keep track of what was 
> indexed prior to the lasts successful commit is a viable way to dela with 
> what happens if solr stops responding (either because you shut it down, or 
> because it crashed for some other reason).
> 
> Alternately, you could take advantage of the "enabled" feature from your 
> client (just have it test the enabled url ever N updates or so) and when 
> it sees that you have disabled the port it can send one last commit and 
> then stop sending updates until it sees the enabled URL work againg -- as 
> soon as you see the updates stop, you can safely shutdown hte port.
> 
> 
> -Hoss


Re: Stop solr without losing documents

Posted by Chris Hostetter <ho...@fucit.org>.
: which documents have been updated before a successful commit.  Now
: stopping solr is as easy as kill -9.

please don't kill -9 ... it's grossly overkill, and doesn't give your 
servlet container a fair chance to cleanthings up.  A lot of work has been 
done to make Lucene indexes robust to hard terminations of the JVM (or 
physical machine) but there's no reason to go out of your way to try and 
stab it in the heart when you could just shut it down cleanly.

that's not to say your appraoch isn't a good one -- if you only have one 
client sending updates/commits then having it keep track of what was 
indexed prior to the lasts successful commit is a viable way to dela with 
what happens if solr stops responding (either because you shut it down, or 
because it crashed for some other reason).

Alternately, you could take advantage of the "enabled" feature from your 
client (just have it test the enabled url ever N updates or so) and when 
it sees that you have disabled the port it can send one last commit and 
then stop sending updates until it sees the enabled URL work againg -- as 
soon as you see the updates stop, you can safely shutdown hte port.


-Hoss


Re: Stop solr without losing documents

Posted by Michael <so...@gmail.com>.
On Fri, Nov 13, 2009 at 4:32 AM, gwk <gi...@eyefi.nl> wrote:
> I don't know if this is the best solution, or even if it's applicable to
> your situation but we do incremental updates from a database based on a
> timestamp, (from a simple seperate sql table filled by triggers so deletes

Thanks, gwk!  This doesn't exactly meet our needs, but helped us get
to a solution.  In short, we are manually committing in our outside
updater process (instead of letting Solr autocommit), and marking
which documents have been updated before a successful commit.  Now
stopping solr is as easy as kill -9.

Michael

Re: Stop solr without losing documents

Posted by gwk <gi...@eyefi.nl>.
Michael wrote:
> I've got a process external to Solr that is constantly feeding it new
> documents, retrying if Solr is nonresponding.  What's the right way to
> stop Solr (running in Tomcat) so no documents are lost?
>
> Currently I'm committing all cores and then running catalina's stop
> script, but between my commit and the stop, more documents can come in
> that would need *another* commit...
>
> Lots of people must have had this problem already, so I know the
> answer is simple; I just can't find it!
>
> Thanks.
> Michael
>   
I don't know if this is the best solution, or even if it's applicable to 
your situation but we do incremental updates from a database based on a 
timestamp, (from a simple seperate sql table filled by triggers so 
deletes are measures correctly as well). We store this timestamp in solr 
as well. Our index script first does a simple Solr request to request 
the newest timestamp and basically selects the documents to update with 
a "SELECT * FROM document_updates WHERE timestamp >= X" where X is the 
timestamp returned from Solr (We use >= for the hopefully extremely rare 
case when two updates are at the same time and also at the same time the 
index script is run where it only retrieved one of the updates, this 
will cause some documents to be updates multiple times but as document 
updates are idempotent this is no real problem.)

Regards,

gwk