You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Otis Gospodnetic <ot...@gmail.com> on 2013/11/19 21:02:43 UTC

Split shard and stream sub-shards to remote nodes?

Hi,

Is it possible to perform a shard split and stream data for the
new/sub-shards to remote nodes, avoiding persistence of new/sub-shards
on the local/source node first?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/

Re: Split shard and stream sub-shards to remote nodes?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Ouch :(
I guess it's as efficient as it can be.... but too bad, because writing to
a remove node sounds awesomely cool.... to me at least. :)

Thanks for explaining the key bits, Shalin.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Nov 22, 2013 at 7:54 AM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> The splitting process is nothing but the creation of a bitset with
> which a LiveDocsReader is created. These readers are then added to the
> a new index via IW.addIndexes(IndexReader[] readers) method. All this
> is performed below the IR/IW API and no documents are actually ever
> read or written directly by Solr. This is why it isn't feasible to
> stream docs to a remote node.
>
> On Fri, Nov 22, 2013 at 5:59 AM, Otis Gospodnetic
> <ot...@gmail.com> wrote:
> > Hi,
> >
> > On Wed, Nov 20, 2013 at 12:53 PM, Shalin Shekhar Mangar <
> > shalinmangar@gmail.com> wrote:
> >
> >> At the Lucene level, I think it would require a directory
> >> implementation which writes to a remote node directly. Otherwise, on
> >> the solr side, we must move the leader itself to another node which
> >> has enough disk space and then split the index.
> >>
> >
> > Hm.... what about taking the source shard, splitting it, and sending docs
> > that come out of each sub-shards to a remote node at Solr level, as if
> > these documents are just being added (i.e. nothing at Lucene level)?
> >
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> >
> >
> >>
> >> On Wed, Nov 20, 2013 at 8:37 PM, Otis Gospodnetic
> >> <ot...@gmail.com> wrote:
> >> > Do you think this is something that is actually implementable?  If so,
> >> > I'll open an issue.
> >> >
> >> > One use-case where this may come in handy is when the disk space is
> >> > tight.  If a shard is using > 50% of the disk space on some node X,
> >> > you can't really split that shard because the 2 new sub-shards will
> >> > not fit on the local disk.  Or is there some trick one could use in
> >> > this situation?
> >> >
> >> > Thanks,
> >> > Otis
> >> > --
> >> > Performance Monitoring * Log Analytics * Search Analytics
> >> > Solr & Elasticsearch Support * http://sematext.com/
> >> >
> >> >
> >> > On Wed, Nov 20, 2013 at 6:48 AM, Shalin Shekhar Mangar
> >> > <sh...@gmail.com> wrote:
> >> >> No, it is not supported yet. We can't split to a remote node
> directly.
> >> >> The best bet is trigger a new leader election by unloading the leader
> >> >> node once all replicas are active.
> >> >>
> >> >> On Wed, Nov 20, 2013 at 1:32 AM, Otis Gospodnetic
> >> >> <ot...@gmail.com> wrote:
> >> >>> Hi,
> >> >>>
> >> >>> Is it possible to perform a shard split and stream data for the
> >> >>> new/sub-shards to remote nodes, avoiding persistence of
> new/sub-shards
> >> >>> on the local/source node first?
> >> >>>
> >> >>> Thanks,
> >> >>> Otis
> >> >>> --
> >> >>> Performance Monitoring * Log Analytics * Search Analytics
> >> >>> Solr & Elasticsearch Support * http://sematext.com/
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Regards,
> >> >> Shalin Shekhar Mangar.
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Split shard and stream sub-shards to remote nodes?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
The splitting process is nothing but the creation of a bitset with
which a LiveDocsReader is created. These readers are then added to the
a new index via IW.addIndexes(IndexReader[] readers) method. All this
is performed below the IR/IW API and no documents are actually ever
read or written directly by Solr. This is why it isn't feasible to
stream docs to a remote node.

On Fri, Nov 22, 2013 at 5:59 AM, Otis Gospodnetic
<ot...@gmail.com> wrote:
> Hi,
>
> On Wed, Nov 20, 2013 at 12:53 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
>> At the Lucene level, I think it would require a directory
>> implementation which writes to a remote node directly. Otherwise, on
>> the solr side, we must move the leader itself to another node which
>> has enough disk space and then split the index.
>>
>
> Hm.... what about taking the source shard, splitting it, and sending docs
> that come out of each sub-shards to a remote node at Solr level, as if
> these documents are just being added (i.e. nothing at Lucene level)?
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
>
>>
>> On Wed, Nov 20, 2013 at 8:37 PM, Otis Gospodnetic
>> <ot...@gmail.com> wrote:
>> > Do you think this is something that is actually implementable?  If so,
>> > I'll open an issue.
>> >
>> > One use-case where this may come in handy is when the disk space is
>> > tight.  If a shard is using > 50% of the disk space on some node X,
>> > you can't really split that shard because the 2 new sub-shards will
>> > not fit on the local disk.  Or is there some trick one could use in
>> > this situation?
>> >
>> > Thanks,
>> > Otis
>> > --
>> > Performance Monitoring * Log Analytics * Search Analytics
>> > Solr & Elasticsearch Support * http://sematext.com/
>> >
>> >
>> > On Wed, Nov 20, 2013 at 6:48 AM, Shalin Shekhar Mangar
>> > <sh...@gmail.com> wrote:
>> >> No, it is not supported yet. We can't split to a remote node directly.
>> >> The best bet is trigger a new leader election by unloading the leader
>> >> node once all replicas are active.
>> >>
>> >> On Wed, Nov 20, 2013 at 1:32 AM, Otis Gospodnetic
>> >> <ot...@gmail.com> wrote:
>> >>> Hi,
>> >>>
>> >>> Is it possible to perform a shard split and stream data for the
>> >>> new/sub-shards to remote nodes, avoiding persistence of new/sub-shards
>> >>> on the local/source node first?
>> >>>
>> >>> Thanks,
>> >>> Otis
>> >>> --
>> >>> Performance Monitoring * Log Analytics * Search Analytics
>> >>> Solr & Elasticsearch Support * http://sematext.com/
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Shalin Shekhar Mangar.
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Split shard and stream sub-shards to remote nodes?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

On Wed, Nov 20, 2013 at 12:53 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> At the Lucene level, I think it would require a directory
> implementation which writes to a remote node directly. Otherwise, on
> the solr side, we must move the leader itself to another node which
> has enough disk space and then split the index.
>

Hm.... what about taking the source shard, splitting it, and sending docs
that come out of each sub-shards to a remote node at Solr level, as if
these documents are just being added (i.e. nothing at Lucene level)?

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/




>
> On Wed, Nov 20, 2013 at 8:37 PM, Otis Gospodnetic
> <ot...@gmail.com> wrote:
> > Do you think this is something that is actually implementable?  If so,
> > I'll open an issue.
> >
> > One use-case where this may come in handy is when the disk space is
> > tight.  If a shard is using > 50% of the disk space on some node X,
> > you can't really split that shard because the 2 new sub-shards will
> > not fit on the local disk.  Or is there some trick one could use in
> > this situation?
> >
> > Thanks,
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > On Wed, Nov 20, 2013 at 6:48 AM, Shalin Shekhar Mangar
> > <sh...@gmail.com> wrote:
> >> No, it is not supported yet. We can't split to a remote node directly.
> >> The best bet is trigger a new leader election by unloading the leader
> >> node once all replicas are active.
> >>
> >> On Wed, Nov 20, 2013 at 1:32 AM, Otis Gospodnetic
> >> <ot...@gmail.com> wrote:
> >>> Hi,
> >>>
> >>> Is it possible to perform a shard split and stream data for the
> >>> new/sub-shards to remote nodes, avoiding persistence of new/sub-shards
> >>> on the local/source node first?
> >>>
> >>> Thanks,
> >>> Otis
> >>> --
> >>> Performance Monitoring * Log Analytics * Search Analytics
> >>> Solr & Elasticsearch Support * http://sematext.com/
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Split shard and stream sub-shards to remote nodes?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
At the Lucene level, I think it would require a directory
implementation which writes to a remote node directly. Otherwise, on
the solr side, we must move the leader itself to another node which
has enough disk space and then split the index.

On Wed, Nov 20, 2013 at 8:37 PM, Otis Gospodnetic
<ot...@gmail.com> wrote:
> Do you think this is something that is actually implementable?  If so,
> I'll open an issue.
>
> One use-case where this may come in handy is when the disk space is
> tight.  If a shard is using > 50% of the disk space on some node X,
> you can't really split that shard because the 2 new sub-shards will
> not fit on the local disk.  Or is there some trick one could use in
> this situation?
>
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Wed, Nov 20, 2013 at 6:48 AM, Shalin Shekhar Mangar
> <sh...@gmail.com> wrote:
>> No, it is not supported yet. We can't split to a remote node directly.
>> The best bet is trigger a new leader election by unloading the leader
>> node once all replicas are active.
>>
>> On Wed, Nov 20, 2013 at 1:32 AM, Otis Gospodnetic
>> <ot...@gmail.com> wrote:
>>> Hi,
>>>
>>> Is it possible to perform a shard split and stream data for the
>>> new/sub-shards to remote nodes, avoiding persistence of new/sub-shards
>>> on the local/source node first?
>>>
>>> Thanks,
>>> Otis
>>> --
>>> Performance Monitoring * Log Analytics * Search Analytics
>>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.



-- 
Regards,
Shalin Shekhar Mangar.

Re: Split shard and stream sub-shards to remote nodes?

Posted by Otis Gospodnetic <ot...@gmail.com>.
Do you think this is something that is actually implementable?  If so,
I'll open an issue.

One use-case where this may come in handy is when the disk space is
tight.  If a shard is using > 50% of the disk space on some node X,
you can't really split that shard because the 2 new sub-shards will
not fit on the local disk.  Or is there some trick one could use in
this situation?

Thanks,
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Nov 20, 2013 at 6:48 AM, Shalin Shekhar Mangar
<sh...@gmail.com> wrote:
> No, it is not supported yet. We can't split to a remote node directly.
> The best bet is trigger a new leader election by unloading the leader
> node once all replicas are active.
>
> On Wed, Nov 20, 2013 at 1:32 AM, Otis Gospodnetic
> <ot...@gmail.com> wrote:
>> Hi,
>>
>> Is it possible to perform a shard split and stream data for the
>> new/sub-shards to remote nodes, avoiding persistence of new/sub-shards
>> on the local/source node first?
>>
>> Thanks,
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.

Re: Split shard and stream sub-shards to remote nodes?

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
No, it is not supported yet. We can't split to a remote node directly.
The best bet is trigger a new leader election by unloading the leader
node once all replicas are active.

On Wed, Nov 20, 2013 at 1:32 AM, Otis Gospodnetic
<ot...@gmail.com> wrote:
> Hi,
>
> Is it possible to perform a shard split and stream data for the
> new/sub-shards to remote nodes, avoiding persistence of new/sub-shards
> on the local/source node first?
>
> Thanks,
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/



-- 
Regards,
Shalin Shekhar Mangar.