You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Jed Glazner <jg...@adobe.com> on 2012/08/18 00:48:38 UTC
How to make a server become a replica / leader for a collection at
startup
Hello All,
I'm working to solve an interesting problem. The problem that I have is
that when I pull a server out of the cloud (to do maintenance say) and
then bring it back up, it won't automatically sync up with zookeeper and
become a leader or replica for any collections that I have created while
it was off-line even though I specified a number of shards or replicas
higher than the number of servers that are registered with zookeeper.
*Here is my setup:*
External Zookeeper(v 3.3.5) Ensemble (zk1, zk2, zk3)
SolrCloud (4.0.0-BETA) with 2 shards and 2 replicas (shard1, shard2,
shard1a, shard2a)
*Here is the detailed scenario:*
I create a new collection name 'collection2' using the collection api
and specify 2 shards and 2 replicas. (curl
'http://shard1:8983/solr/admin/collections?action=CREATE&name=collection2&numShards=2&numReplicas=2')The
result of the call creates (as I would expect) 2 shards and 2 replicas.
I then push some docs into 'collection2' and I see the documents are
distributed between shard1 and shard2 and are replicated to 1a and 2a.
So far so good.
Now to simulate a node failure I take down shard1a while pushing some
more docs into 'collection2'. Additionally while shard1a is down I also
create a new collection named 'collection3' using the collections api
and specify 2 shards and 2 replicas. The result of the call creates (as
I would expect) 2 shards and 1 replica since shard1a is down there are
not enough servers to create all of the replicas.
Before bringing backup shard1a I push some documents into 'collection3'
and see the docs are distributed between shard1 and shard2 with shard2a
replicating shard2. Everything looks great and working as expected. Thus
far.
When I bring shard1a back on-line however, here is what I would *expect*
to happen:
1. Shard1a registers with zookeeper, zookeeper assigns it as a replica
of shard1 for 'collection2' (it knows about collection2 because it's
stored in the solr.xml)
2. Shard1a asks zookeeper if there are any collections that have missing
replicas, or not enough shards.
2. Zookeeper responds that 'collection3' on shard1 doesn't have a
replica (remember I created the collection with 2 replicas but only one
is present).
4. Shard1a creates a new core and becomes a replica for 'collection3' on
shard1
5. Shard1a synchronizes with shard1 and replicates the missing documents
for 'collection2' and 'collection3'.
However here is what really happens:
1. shard1a registers with zookeeper and is assigned a replica of shard1
for 'collection2'
2. shard1a synchronizes with shard1 and replicates the missing documents
for 'collection2'
Nothing else happens.
How I can I make shard1a automatically become a replica or a leader for
missing cores within a collection when it comes online?
--
*Jed**Glazner*
Sr. Software Engineer
Adobe Social
385.221.1072 (tel)
801.360.0181 (cell)
jglazner@adobe.com
550 East Timpanogus Circle
Orem, UT 84097-6215, USA
www.adobe.com
Re: How to make a server become a replica / leader for a collection
at startup
Posted by Mark Miller <ma...@gmail.com>.
Hmm...last email was blocked from the list as spam :)
Let me try again forcing plain text:
Hey Jed,
I think what you are looking for is something I have proposed, but is not
implemented yet. We started with a fairly simple collections API since we
just wanted to make sure we had something in 4.0.
I would like it to be better though. My proposal was that when you create a
new collection with n shards and z replicas, that should be recorded in
ZooKeeper by the Overseer. The Overseer should then watch for when a new
node comes up - then a trigger a process that compares the config for the
collection against the real world - and remove or add based on that info.
I don't think it's that difficult to do, but given a lot of other things we
are working on, and the worry of destabilizing anything before the 4
release, I think it's more likely to come in a point release later. It's
not super complicated work, but there are some tricky corner cases I think.
- Mark