You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by स्वप्निल रोकडे <sw...@gmail.com> on 2019/06/17 07:01:56 UTC

Loosing data on Solr restart

I am newbie to Apache Manifold (ver – 2.12) and Solr(ver – 7.6) with
Zookeeper (ver 3.14). I have created three collections in Solr out of which
data for two comes from Manifold while one has from manual data insert
through simple solr API. When I run jobs in Manifold I can see data is
getting inserted in Solr and can be seen by querying Solr.

But when I restart solr all the shards and replicas goes down and do not
recover ever. Also, I am unable to reload the collection as it always gives
timeout error. I tried to take index backup and then try to restore it but
restoring also fails with timeout error. I tried this reload command,
restore command from the same server in which they are installed but still
it fails. Looks like problem is only with the collections in which data is
coming from Manifold as my other collection where I insert data via Solr
API starts properly after restart. I don’t see any error getting logged in
solr logs properly.

I am not getting if I missed anything while doing configurations or there
is some kind of lock on solr due to which none of the reload, restore
commands works properly and on restarting solr I lose everything.

Please suggest.
Regards,
Swapnil

Re: Loosing data on Solr restart

Posted by Erick Erickson <er...@gmail.com>.
There are a number of possibilities, but what this really sounds
like is that you aren’t committing your documents. There’s more
than you want to know here:
https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

But here’s the short form: In the absence of a hard commit
(openSeacher=true or false is irrelevant), two things happen:
1> Segments are not closed, and if you shut down Solr
     hard (e.g. kill -9 or the like) everything indexed since
     the last commit will be lost. You can still search them
     before you restart if you have _soft_ commits set to
     something other than -1.
2> If you kill Solr hard, then everything since the last hard
     commit is replayed from the transaction log at startup, 
     which would account for the nodes not coming back up.

So here’s what I’d try to verify whether this is even close to correct:
1> change solrconfig to hard commit (openSearcher=false probably)
     more frequently, say every 15 seconds.
2> Wait at least that long after your indexing is done before you 
     stop Solr.
3> Stop Solr gracefully, using the bin/solr script or the like. Pay
     attention to the termination message when you do, does it
     say anything like “forcefully killing Solr” or similar? If so,
     then the bin/solr script is un-gracefully killing Solr. There’s
     an environment variable in the script you can set to give
     Solr more time.

You do not need to issue commits from your client, that’s usually
a bad practice unless you can guarantee that there’s only a single
client and it only issues one commit at the very end of the run. To
troubleshoot, though, you can issue a commit from the browser, 
SOLR_ADDRESS:PORT/solr/collection/update?commit=true
will do it. It’d be instructive to see how long that takes to come back
as well.

So what this sounds like is that you have a massive number of 
uncommitted documents when Solr stops and it replays them
on startup. Whether that’s the real problem here or not you’ll have
to experiment to determine.

Best,
Erick

> On Jun 17, 2019, at 12:01 AM, स्वप्निल रोकडे <sw...@gmail.com> wrote:
> 
> I am newbie to Apache Manifold (ver – 2.12) and Solr(ver – 7.6) with
> Zookeeper (ver 3.14). I have created three collections in Solr out of which
> data for two comes from Manifold while one has from manual data insert
> through simple solr API. When I run jobs in Manifold I can see data is
> getting inserted in Solr and can be seen by querying Solr.
> 
> But when I restart solr all the shards and replicas goes down and do not
> recover ever. Also, I am unable to reload the collection as it always gives
> timeout error. I tried to take index backup and then try to restore it but
> restoring also fails with timeout error. I tried this reload command,
> restore command from the same server in which they are installed but still
> it fails. Looks like problem is only with the collections in which data is
> coming from Manifold as my other collection where I insert data via Solr
> API starts properly after restart. I don’t see any error getting logged in
> solr logs properly.
> 
> I am not getting if I missed anything while doing configurations or there
> is some kind of lock on solr due to which none of the reload, restore
> commands works properly and on restarting solr I lose everything.
> 
> Please suggest.
> Regards,
> Swapnil