You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "David Smiley (@MITRE.org)" <DS...@mitre.org> on 2014/02/01 00:16:03 UTC
Clone (or Restore) Solrcloud
Hi,
I'm attempting to come up with a SolrCloud restore / clone process for
either recover to a known good state or to clone the environment for
experimentation. At the moment my process involves either creating a new
zookeeper environment or at least deleting the existing Collection so that I
can create a new one. This works; I use the Core API; the first command
defines the collection parameters, and I invoke it once for each replica. I
don't use the Collection API because I want SolrCloud to go off trying to
create all the replicas -- I know where each one is pre-positioned.
What I'm concerned about is what happens once I start wanting to use Shard
splitting, *especially* if I don't want to split all shards because shards
are uneven due to custom routing (e.g. id:"customer!myid"). In this case I
don't know how to create the collection with the hash ranges post-shard
split. Solr doesn't have an API for me to explicitly say what the hash
ranges should be on each shard (to match up with a backup). And I'm
concerned about undocumented pitfalls that may exist in manually
constructing a clusterstate.json, as another approach.
Any ideas?
~ David
-----
Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Clone-or-Restore-Solrcloud-tp4114773.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: Clone (or Restore) Solrcloud
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Hi David,
The parent metadata persists only until the sub-shards become active.
Actually the logic to make the sub-shards active depends on knowing
when all 'sibling' sub-shards' replicas have recovered successfully.
We store the parent to make that easier to look up. Once all replicas
of all sub-shards have recovered, the shard states are updated. The
'updateshardstate' command also removes the 'parent' key from the
sub-shards while switching them to 'active'.
If you're seeing the 'parent' key on a 'active' sub-shard then it may
be a bug. Please paste your clusterstate and I'll look into why it was
left over.
On Mon, Feb 3, 2014 at 10:19 AM, David Smiley (@MITRE.org)
<DS...@mitre.org> wrote:
> I think I figured this out; I hope people find this useful..
>
> It may not be possible to declare what the hash ranges are when you create
> the collection, but you *can* do so when you split via the 'ranges'
> parameter, which is a comma-delimited list. So this means you can create a
> new collection with one shard and then immediately split it to the desired
> ranges to line up with that of your backup. I also observed that if you
> create a collection and then split every shard (in 2), it will result in an
> equivalent collection to one that was created with twice as many shards to
> begin with. I hoped that was so and verified the ranges end up being the
> same both ways.
>
> The only thing that seems like it may be benign but not 100% certain is that
> if you split a shard, the new shards have a 'parent' reference to the name
> of the shard it was split from. And even if you delete that parent shard
> (since it's not needed anymore; it becomes inactive). I'm not sure why this
> metadata is recorded because, at least after the split, I can't see why it's
> pertinent to anything.
>
> ~ David
>
>
> David Smiley (@MITRE.org) wrote
>> Hi,
>>
>> I'm attempting to come up with a SolrCloud restore / clone process for
>> either recover to a known good state or to clone the environment for
>> experimentation. At the moment my process involves either creating a new
>> zookeeper environment or at least deleting the existing Collection so that
>> I can create a new one. This works; I use the Core API; the first command
>> defines the collection parameters, and I invoke it once for each replica.
>> I don't use the Collection API because I want SolrCloud to go off trying
>> to create all the replicas -- I know where each one is pre-positioned.
>>
>> What I'm concerned about is what happens once I start wanting to use Shard
>> splitting, *especially* if I don't want to split all shards because shards
>> are uneven due to custom routing (e.g. id:"customer!myid"). In this case
>> I don't know how to create the collection with the hash ranges post-shard
>> split. Solr doesn't have an API for me to explicitly say what the hash
>> ranges should be on each shard (to match up with a backup). And I'm
>> concerned about undocumented pitfalls that may exist in manually
>> constructing a clusterstate.json, as another approach.
>>
>> Any ideas?
>>
>> ~ David
>
>
>
>
>
> -----
> Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Clone-or-Restore-Solrcloud-tp4114773p4114983.html
> Sent from the Solr - User mailing list archive at Nabble.com.
--
Regards,
Shalin Shekhar Mangar.
Re: Clone (or Restore) Solrcloud
Posted by "David Smiley (@MITRE.org)" <DS...@mitre.org>.
I think I figured this out; I hope people find this useful..
It may not be possible to declare what the hash ranges are when you create
the collection, but you *can* do so when you split via the 'ranges'
parameter, which is a comma-delimited list. So this means you can create a
new collection with one shard and then immediately split it to the desired
ranges to line up with that of your backup. I also observed that if you
create a collection and then split every shard (in 2), it will result in an
equivalent collection to one that was created with twice as many shards to
begin with. I hoped that was so and verified the ranges end up being the
same both ways.
The only thing that seems like it may be benign but not 100% certain is that
if you split a shard, the new shards have a 'parent' reference to the name
of the shard it was split from. And even if you delete that parent shard
(since it's not needed anymore; it becomes inactive). I'm not sure why this
metadata is recorded because, at least after the split, I can't see why it's
pertinent to anything.
~ David
David Smiley (@MITRE.org) wrote
> Hi,
>
> I'm attempting to come up with a SolrCloud restore / clone process for
> either recover to a known good state or to clone the environment for
> experimentation. At the moment my process involves either creating a new
> zookeeper environment or at least deleting the existing Collection so that
> I can create a new one. This works; I use the Core API; the first command
> defines the collection parameters, and I invoke it once for each replica.
> I don't use the Collection API because I want SolrCloud to go off trying
> to create all the replicas -- I know where each one is pre-positioned.
>
> What I'm concerned about is what happens once I start wanting to use Shard
> splitting, *especially* if I don't want to split all shards because shards
> are uneven due to custom routing (e.g. id:"customer!myid"). In this case
> I don't know how to create the collection with the hash ranges post-shard
> split. Solr doesn't have an API for me to explicitly say what the hash
> ranges should be on each shard (to match up with a backup). And I'm
> concerned about undocumented pitfalls that may exist in manually
> constructing a clusterstate.json, as another approach.
>
> Any ideas?
>
> ~ David
-----
Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/Clone-or-Restore-Solrcloud-tp4114773p4114983.html
Sent from the Solr - User mailing list archive at Nabble.com.