You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Nikolay Shuyskiy <ni...@genestack.com> on 2015/10/22 16:29:48 UTC

Split shard onto new physical volumes

Hello.

We have a Solr 5.3.0 installation with ~4 TB index size, and the volume  
containing it is almost full. I hoped to utilize SolrCloud power to split  
index into two shards or Solr nodes, thus spreading index across several  
physical devices. But as I look closer, it turns out that splitting shard  
will create two new shards *on the same node* (and on the same storage  
volume), so it's not possible for more-than-a-half-full volume.

I imagined that I could, say, add two new nodes to SolrCloud, and split  
shard so that two new shards ("halves" of the one being split) will be  
created on those new nodes.

Right now the only way to split shard in my situation I see is to create  
two directories (shard_1_0 and shard_1_1) and mount new volumes onto them  
*before* calling SPLITSHARD. Then I would be able to split shards, and  
after adding two new nodes, these new shards will be replicated, and I'll  
be able to clean up all the data on the first node.

Please advise me on this, I hope I've missed something that would ease  
that kind of scaling.

-- 
Yrs sincerely,
  Nikolay Shuyskiy

Re: Split shard onto new physical volumes

Posted by Nikolay Shuyskiy <ni...@genestack.com>.
> On Tue, Oct 27, 2015, at 10:50 AM, Nikolay Shuyskiy wrote:
>> Den 2015-10-22 17:54:44 skrev Shawn Heisey <ap...@elyograg.org>:
>>
>>> On 10/22/2015 8:29 AM, Nikolay Shuyskiy wrote:
>> >> I imagined that I could, say, add two new nodes to SolrCloud, and  
>> split
>> >> shard so that two new shards ("halves" of the one being split) will  
>> be
>> >> created on those new nodes.
>> >>
>> >> Right now the only way to split shard in my situation I see is to  
>> create
>> >> two directories (shard_1_0 and shard_1_1) and mount new volumes onto
>> >> them *before* calling SPLITSHARD. Then I would be able to split  
>> shards,
>> >> and after adding two new nodes, these new shards will be replicated,  
>> and
>> >> I'll be able to clean up all the data on the first node.
>> >
>> > The reason that they must be on the same node is because index  
>> splitting
>> > is a *Lucene* operation, and Lucene has no knowledge of Solr nodes,  
>> only
>> > the one index on the one machine.
>> >
>> > Depending on the overall cloud distribution, one option *might* be to
>> > add a replica of the shard you want to split to one or more new nodes
>> > with plenty of disk space, and after it is replicated, delete it from
>> > any nodes where the disk is nearly full.  Then do the split operation,
>> > and once it's done, use ADDREPLICA/DELETEREPLICA to arrange everything
>> > the way you want it.
>> Thank you, that makes sense and is a usable alternative for us for the
>> time being.
>> Probably we have to consider using implicit routing for the future so
>> that we could add new nodes without dealing with splitting.
>
> Depends upon the use-case. For things like log files, use time based
> collections, then create/destroy collection aliases to point to them.
>
> I've had a "today" alias that points to logs_20151027 and logs_20151026,
> meaning all content for the last 24hrs is available via
> http://localhost:8983/solr/today. I had "week" and "month" also.
>
> Dunno if that works for you.
Thanks for sharing your experience, but in our case any kind of time-based  
splitting is irrelevant. If worse comes to worst, we can impose some kind  
of pre-grouping on our documents (thank you for idea!), but it'd  
complicate application logic (and Solr maintenance, I'm afraid) too much  
for our taste.

-- 
Yrs sincerely,
  Nikolay Shuyskiy

Re: Split shard onto new physical volumes

Posted by Upayavira <uv...@odoko.co.uk>.

On Tue, Oct 27, 2015, at 10:50 AM, Nikolay Shuyskiy wrote:
> Den 2015-10-22 17:54:44 skrev Shawn Heisey <ap...@elyograg.org>:
> 
> > On 10/22/2015 8:29 AM, Nikolay Shuyskiy wrote:
> >> I imagined that I could, say, add two new nodes to SolrCloud, and split
> >> shard so that two new shards ("halves" of the one being split) will be
> >> created on those new nodes.
> >>
> >> Right now the only way to split shard in my situation I see is to create
> >> two directories (shard_1_0 and shard_1_1) and mount new volumes onto
> >> them *before* calling SPLITSHARD. Then I would be able to split shards,
> >> and after adding two new nodes, these new shards will be replicated, and
> >> I'll be able to clean up all the data on the first node.
> >
> > The reason that they must be on the same node is because index splitting
> > is a *Lucene* operation, and Lucene has no knowledge of Solr nodes, only
> > the one index on the one machine.
> >
> > Depending on the overall cloud distribution, one option *might* be to
> > add a replica of the shard you want to split to one or more new nodes
> > with plenty of disk space, and after it is replicated, delete it from
> > any nodes where the disk is nearly full.  Then do the split operation,
> > and once it's done, use ADDREPLICA/DELETEREPLICA to arrange everything
> > the way you want it.
> Thank you, that makes sense and is a usable alternative for us for the  
> time being.
> Probably we have to consider using implicit routing for the future so
> that  
> we could add new nodes without dealing with splitting.

Depends upon the use-case. For things like log files, use time based
collections, then create/destroy collection aliases to point to them.

I've had a "today" alias that points to logs_20151027 and logs_20151026,
meaning all content for the last 24hrs is available via
http://localhost:8983/solr/today. I had "week" and "month" also.

Dunno if that works for you.

Upayavira

Re: Split shard onto new physical volumes

Posted by Nikolay Shuyskiy <ni...@genestack.com>.
Den 2015-10-22 17:54:44 skrev Shawn Heisey <ap...@elyograg.org>:

> On 10/22/2015 8:29 AM, Nikolay Shuyskiy wrote:
>> I imagined that I could, say, add two new nodes to SolrCloud, and split
>> shard so that two new shards ("halves" of the one being split) will be
>> created on those new nodes.
>>
>> Right now the only way to split shard in my situation I see is to create
>> two directories (shard_1_0 and shard_1_1) and mount new volumes onto
>> them *before* calling SPLITSHARD. Then I would be able to split shards,
>> and after adding two new nodes, these new shards will be replicated, and
>> I'll be able to clean up all the data on the first node.
>
> The reason that they must be on the same node is because index splitting
> is a *Lucene* operation, and Lucene has no knowledge of Solr nodes, only
> the one index on the one machine.
>
> Depending on the overall cloud distribution, one option *might* be to
> add a replica of the shard you want to split to one or more new nodes
> with plenty of disk space, and after it is replicated, delete it from
> any nodes where the disk is nearly full.  Then do the split operation,
> and once it's done, use ADDREPLICA/DELETEREPLICA to arrange everything
> the way you want it.
Thank you, that makes sense and is a usable alternative for us for the  
time being.
Probably we have to consider using implicit routing for the future so that  
we could add new nodes without dealing with splitting.

-- 
Yrs sincerely,
  Nikolay Shuyskiy

Re: Split shard onto new physical volumes

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/22/2015 8:29 AM, Nikolay Shuyskiy wrote:
> I imagined that I could, say, add two new nodes to SolrCloud, and split
> shard so that two new shards ("halves" of the one being split) will be
> created on those new nodes.
> 
> Right now the only way to split shard in my situation I see is to create
> two directories (shard_1_0 and shard_1_1) and mount new volumes onto
> them *before* calling SPLITSHARD. Then I would be able to split shards,
> and after adding two new nodes, these new shards will be replicated, and
> I'll be able to clean up all the data on the first node.

The reason that they must be on the same node is because index splitting
is a *Lucene* operation, and Lucene has no knowledge of Solr nodes, only
the one index on the one machine.

Depending on the overall cloud distribution, one option *might* be to
add a replica of the shard you want to split to one or more new nodes
with plenty of disk space, and after it is replicated, delete it from
any nodes where the disk is nearly full.  Then do the split operation,
and once it's done, use ADDREPLICA/DELETEREPLICA to arrange everything
the way you want it.

Thanks,
Shawn