You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Henrik Brautaset Aronsen <he...@synth.no> on 2016/06/08 09:40:32 UTC

Re-create shard with compositeId router and known hash range

Hi.

We have a SolrCloud setup with 20 shards, each with only 1 replica, served
on 8 servers.

After a server went down we are left with 16 shards, which means that some
of the compositeId hash ranges aren't hosted by any cores.  Somehow the
shards/cores didn't come back after the server came up again.  I can see
the server in /live_nodes.

But all is not bad: The data in the collection is volatile with a TTS of 30
minutes, and we have a failover in place that tries a new random
compositeId whenever an "add" operation fails.

My question is: Is it possible to re-create the missing shards or do I have
to delete and create the collection from scratch?

I know which hash ranges are are missing, but the CREATESHARD [1] API call
doesn't support shards with the 'compositeId' router.  And I cannot use
SPLITSHARD [2] since it only divides the original shard's hash.

Best regards,
Henrik


[1]
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api8
[2]
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3

Re: Re-create shard with compositeId router and known hash range

Posted by Henrik Brautaset Aronsen <he...@synth.no>.
On Mon, Jun 13, 2016 at 4:59 PM, Erick Erickson <er...@gmail.com>
wrote:

> Yes, Solr will pick that up. You won't have any replicas
> though so you'll have to ADDREPLICA afterwards.
> You could use the EMPTY option on the creteNodeSet
> of the Collections API to create a dummy collection
> to see what a no-replica shard should look like as
> a model
>

Thanks, that's good to know.  I'll definitely try that if the problem
reoccurs.

Henrik

Re: Re-create shard with compositeId router and known hash range

Posted by Erick Erickson <er...@gmail.com>.
Yes, Solr will pick that up. You won't have any replicas
though so you'll have to ADDREPLICA afterwards.
You could use the EMPTY option on the creteNodeSet
of the Collections API to create a dummy collection
to see what a no-replica shard should look like as
a model

Best,
Erick



On Mon, Jun 13, 2016 at 2:40 AM, Henrik Brautaset Aronsen
<he...@synth.no> wrote:
> On Mon, Jun 13, 2016 at 1:50 AM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> So to be clear we're talking about the same thing, your
>> Zookeeper has a collections>>my_collection>>state.json
>> ZNode. In that state.json you have information
>> for all the shards, and you're saying that you had
>> only 16 when there used to be 20, right?
>>
>
> Yup.
>
> If that's the case I'd be really curious how that happened,
>> because there's no reason Solr should have done
>> that.
>>
>
> I agree, it's disturbing!  I thought about just editing state.json (through
> zkNavigator), adding the missing shards.  Would Solr have picked that up?
>
> Henrik

Re: Re-create shard with compositeId router and known hash range

Posted by Henrik Brautaset Aronsen <he...@synth.no>.
On Mon, Jun 13, 2016 at 1:50 AM, Erick Erickson <er...@gmail.com>
wrote:

> So to be clear we're talking about the same thing, your
> Zookeeper has a collections>>my_collection>>state.json
> ZNode. In that state.json you have information
> for all the shards, and you're saying that you had
> only 16 when there used to be 20, right?
>

Yup.

If that's the case I'd be really curious how that happened,
> because there's no reason Solr should have done
> that.
>

I agree, it's disturbing!  I thought about just editing state.json (through
zkNavigator), adding the missing shards.  Would Solr have picked that up?

Henrik

Re: Re-create shard with compositeId router and known hash range

Posted by Erick Erickson <er...@gmail.com>.
What's most disturbing here is that the zookeeper info
disappeared for certain shards. This simply should not be
happening just because you lost a server or two. The info
should still be in Zookeeper for the shards in question. All
the replicas will be down of course, but they'll still be there.

So to be clear we're talking about the same thing, your
Zookeeper has a collections>>my_collection>>state.json
ZNode. In that state.json you have information
for all the shards, and you're saying that you had
only 16 when there used to be 20, right?

(note older Solr versions would have it all in
/clusterstate.json).

If that's the case I'd be really curious how that happened,
because there's no reason Solr should have done
that.

Best,
Erick



On Fri, Jun 10, 2016 at 12:35 PM, Henrik Brautaset Aronsen
<he...@synth.no> wrote:
> On Fri, Jun 10, 2016 at 6:18 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> Well, how brave do you want to be ;)?
>
>
> Hi Erick, thanks for your reply!
>
>
>> There's no great magic to the
>> Zookeeper nodes here. If you do everything just right you could create
>> one manually. By that I mean you could "hand edit" the znode with the
>> Zookeeper commands, you'd have to dig for the exact commands....
>
>
> So, if I create (or edit) the correct entries in ZK, Solr should just pick
> that up and behave accordingly?  I thought I had to do this through the
> Solr API.  I think I'll experiment some more with this.
>
>
>> You _may_ be able to use the ADDREPLICA command, assuming that the shard
>> information is still in the ZK node. I haven't tried this however.
>>
>
> The shard information is gone from zookeeper (I guess that's what you mean
> by ZK node?), and I can't specify hash ranges through the ADDREPLICA
> command.
>
>
>> All that said, if the node is somehow permanently gone, you have to
>> re-index anyway to get the data back so recreating the collection
>> would be less fooling around.
>>
>
> I'm not really interested in the data, since all data in the collection has
> a TTL of 30 minutes.
>
> I ended up re-creating the collection even though it gave me a couple of
> minutes downtime.  If this happens again, it would be awesome if I could
>  manually create the shards with the specified hash ranges.
>
> Cheers,
> Henrik

Re: Re-create shard with compositeId router and known hash range

Posted by Henrik Brautaset Aronsen <he...@synth.no>.
On Fri, Jun 10, 2016 at 6:18 PM, Erick Erickson <er...@gmail.com>
wrote:

> Well, how brave do you want to be ;)?


Hi Erick, thanks for your reply!


> There's no great magic to the
> Zookeeper nodes here. If you do everything just right you could create
> one manually. By that I mean you could "hand edit" the znode with the
> Zookeeper commands, you'd have to dig for the exact commands....


So, if I create (or edit) the correct entries in ZK, Solr should just pick
that up and behave accordingly?  I thought I had to do this through the
Solr API.  I think I'll experiment some more with this.


> You _may_ be able to use the ADDREPLICA command, assuming that the shard
> information is still in the ZK node. I haven't tried this however.
>

The shard information is gone from zookeeper (I guess that's what you mean
by ZK node?), and I can't specify hash ranges through the ADDREPLICA
command.


> All that said, if the node is somehow permanently gone, you have to
> re-index anyway to get the data back so recreating the collection
> would be less fooling around.
>

I'm not really interested in the data, since all data in the collection has
a TTL of 30 minutes.

I ended up re-creating the collection even though it gave me a couple of
minutes downtime.  If this happens again, it would be awesome if I could
 manually create the shards with the specified hash ranges.

Cheers,
Henrik

Re: Re-create shard with compositeId router and known hash range

Posted by Erick Erickson <er...@gmail.com>.
Well, how brave do you want to be ;)? There's no great magic to the
Zookeeper nodes here. If you do everything just right you could create
one manually. By that I mean you could "hand edit" the znode with the
Zookeeper commands, you'd have to dig for the exact commands.... You
_may_ be able to use the ADDREPLICA command, assuming that the shard
information is still in the ZK node. I haven't tried this however.

The first thing I'd do is see what's up with the replicas not coming
up. What does the Solr log show? There must be some info there. And,
assuming you still have a data directory there, then your docs _may_
be intact. you may be able to simply move them to some other Solr
instance (actually, the entire collectionX_shardY_replicaZ and bounce
the Solr server there. I don't _know_ that'll work, but what you'd see
is the node in Zookeeper magically change the IP address of the
replicas in question.

All that said, if the node is somehow permanently gone, you have to
re-index anyway to get the data back so recreating the collection
would be less fooling around.

Best,
Erick

On Wed, Jun 8, 2016 at 2:40 AM, Henrik Brautaset Aronsen
<he...@synth.no> wrote:
> Hi.
>
> We have a SolrCloud setup with 20 shards, each with only 1 replica, served
> on 8 servers.
>
> After a server went down we are left with 16 shards, which means that some
> of the compositeId hash ranges aren't hosted by any cores.  Somehow the
> shards/cores didn't come back after the server came up again.  I can see
> the server in /live_nodes.
>
> But all is not bad: The data in the collection is volatile with a TTS of 30
> minutes, and we have a failover in place that tries a new random
> compositeId whenever an "add" operation fails.
>
> My question is: Is it possible to re-create the missing shards or do I have
> to delete and create the collection from scratch?
>
> I know which hash ranges are are missing, but the CREATESHARD [1] API call
> doesn't support shards with the 'compositeId' router.  And I cannot use
> SPLITSHARD [2] since it only divides the original shard's hash.
>
> Best regards,
> Henrik
>
>
> [1]
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api8
> [2]
> https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api3