You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by adfel70 <ad...@gmail.com> on 2013/11/19 14:18:41 UTC

Question regarding possibility of data loss

Hi, we plan to establish an ensemble of solr with zookeeper. 
We gonna have 6 solr servers with 2 instances on each server, also we'll
have 6 shards with replication factor 2, in addition we'll have 3
zookeepers. 

Our concern is that we will send documents to index and solr won't index
them but won't send any error message and we will suffer a data loss

1. Is there any situation that can cause this kind of problem? 
2. Can it happen if some of ZKs are down? or some of the solr instances? 
3. How can we monitor them? Can we do something to prevent these kind of
errors? 

Thanks in advance 



--
View this message in context: http://lucene.472066.n3.nabble.com/Question-regarding-possibility-of-data-loss-tp4101915.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Question regarding possibility of data loss

Posted by Mark Miller <ma...@gmail.com>.
I’d recommend you start with the upcoming 4.6 release. Should be out this week or next.

- Mark

On Nov 19, 2013, at 8:18 AM, adfel70 <ad...@gmail.com> wrote:

> Hi, we plan to establish an ensemble of solr with zookeeper. 
> We gonna have 6 solr servers with 2 instances on each server, also we'll
> have 6 shards with replication factor 2, in addition we'll have 3
> zookeepers. 
> 
> Our concern is that we will send documents to index and solr won't index
> them but won't send any error message and we will suffer a data loss
> 
> 1. Is there any situation that can cause this kind of problem? 
> 2. Can it happen if some of ZKs are down? or some of the solr instances? 
> 3. How can we monitor them? Can we do something to prevent these kind of
> errors? 
> 
> Thanks in advance 
> 
> 
> 
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Question-regarding-possibility-of-data-loss-tp4101915.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question regarding possibility of data loss

Posted by Daniel Collins <da...@gmail.com>.
Regarding data loss, Solr returns an error code to the callling app (either
HTTP error code, or equivalent in SolrJ), so if it fails to index for a
known reason, you'll know about it.

There are always edge cases though.

If Solr indexes the document (returns success), that means the document is
in the transaction log (and should be in the log for each replica).
If someone pulls the plug on the machines and the hard drives crash, then
the transaction log might not be re-playable when the system comes back
up...

Now Solr won't tell you what's trashed (since it can't possibly know). At
that point your whole collection might be corrupt, but *presumably* you
will have a backup available (onsite or off) and a checkpoint time of when
you took that backup, so you can replay any indexing work that might have
happened since then.

Admittedly that's extreme, but it depends how cast iron a guarantee you
want :)

But in all seriousness, Shaun is right, Solr is stable, and if it can't
index a doc it will tell you.
In the case of ALL ZK being down or all Solr servers for a particular
shard, you will generate an error when you try to index anything (HTTP
503/Service Is Unavailable or the SolrJ equivalent).


On 19 November 2013 15:35, Shawn Heisey <so...@elyograg.org> wrote:

> On 11/19/2013 6:18 AM, adfel70 wrote:
> > Hi, we plan to establish an ensemble of solr with zookeeper.
> > We gonna have 6 solr servers with 2 instances on each server, also we'll
> > have 6 shards with replication factor 2, in addition we'll have 3
> > zookeepers.
>
> You'll want to do one Solr instance per machine.  Each Solr instance can
> house many cores (shard replicas).  More than one instance per machine
> will: 1) Add memory/CPU overhead.  2) Accidentally and easily result in
> a situation where multiple replicas for a single shard are located on
> the same machine.
>
> > Our concern is that we will send documents to index and solr won't index
> > them but won't send any error message and we will suffer a data loss
> >
> > 1. Is there any situation that can cause this kind of problem?
> > 2. Can it happen if some of ZKs are down? or some of the solr instances?
> > 3. How can we monitor them? Can we do something to prevent these kind of
> > errors?
>
> 1) If it does become possible for data loss to occur without notifying
> your application, it will be considered a very serious bug, and top
> priority will be given to fixing it.  A release with the fix will be
> made as quickly as possible.  Of course I cannot guarantee that such
> bugs don't exist, but I am not aware of any at the moment.
>
> 2) You must have a majority ([n/2] + 1) of zookeepers operational.  If
> you have three or four zookeepers, one zookeeper can be down and
> SolrCloud will continue to function perfectly.  With five or six
> zookeepers, two can be down.  With seven or eight, three can be down.
> As far as Solr itself, if one replica of each shard from a collection is
> working, then the entire collection will work.  That means you'll want
> to have at least replicationFactor=2, so there are two copies of each
> shard.
>
> 3) There are MANY options for monitoring.  Many of them are completely
> free, and it is always possible to write your own.  One high-level thing
> you can do is make sure the hosts are up and that they are running the
> proper number of java processes.  Solr offers a number of API entry
> points that will tell you how things are working, and more are added
> over time.  I don't think there are any zookeeper-specific informational
> capabilities at the moment, but I did file a bug report asking for the
> feature.  When I have some time, I will work on a fix for it.  One of
> the other committers may decide to work on it as well.
>
> If you want out-of-the-box Solr-specific monitoring and are willing to
> pay for it, Sematext offers SPM.  One of Sematext's employees is very
> active on this list, and they just added Zookeeper monitoring to their
> capabilities.  They do have a free version, but it has extremely limited
> monitoring history.
>
> http://sematext.com/
>
> Thanks,
> Shawn
>
>

Re: Question regarding possibility of data loss

Posted by Shawn Heisey <so...@elyograg.org>.
On 11/19/2013 6:18 AM, adfel70 wrote:
> Hi, we plan to establish an ensemble of solr with zookeeper. 
> We gonna have 6 solr servers with 2 instances on each server, also we'll
> have 6 shards with replication factor 2, in addition we'll have 3
> zookeepers. 

You'll want to do one Solr instance per machine.  Each Solr instance can
house many cores (shard replicas).  More than one instance per machine
will: 1) Add memory/CPU overhead.  2) Accidentally and easily result in
a situation where multiple replicas for a single shard are located on
the same machine.

> Our concern is that we will send documents to index and solr won't index
> them but won't send any error message and we will suffer a data loss
> 
> 1. Is there any situation that can cause this kind of problem? 
> 2. Can it happen if some of ZKs are down? or some of the solr instances? 
> 3. How can we monitor them? Can we do something to prevent these kind of
> errors? 

1) If it does become possible for data loss to occur without notifying
your application, it will be considered a very serious bug, and top
priority will be given to fixing it.  A release with the fix will be
made as quickly as possible.  Of course I cannot guarantee that such
bugs don't exist, but I am not aware of any at the moment.

2) You must have a majority ([n/2] + 1) of zookeepers operational.  If
you have three or four zookeepers, one zookeeper can be down and
SolrCloud will continue to function perfectly.  With five or six
zookeepers, two can be down.  With seven or eight, three can be down.
As far as Solr itself, if one replica of each shard from a collection is
working, then the entire collection will work.  That means you'll want
to have at least replicationFactor=2, so there are two copies of each shard.

3) There are MANY options for monitoring.  Many of them are completely
free, and it is always possible to write your own.  One high-level thing
you can do is make sure the hosts are up and that they are running the
proper number of java processes.  Solr offers a number of API entry
points that will tell you how things are working, and more are added
over time.  I don't think there are any zookeeper-specific informational
capabilities at the moment, but I did file a bug report asking for the
feature.  When I have some time, I will work on a fix for it.  One of
the other committers may decide to work on it as well.

If you want out-of-the-box Solr-specific monitoring and are willing to
pay for it, Sematext offers SPM.  One of Sematext's employees is very
active on this list, and they just added Zookeeper monitoring to their
capabilities.  They do have a free version, but it has extremely limited
monitoring history.

http://sematext.com/

Thanks,
Shawn