You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by roz dev <ro...@gmail.com> on 2012/02/27 09:26:36 UTC

Solr Cloud, Commits and Master/Slave configuration

Hi All,

I am trying to understand features of Solr Cloud, regarding commits and
scaling.


   - If I am using Solr Cloud then do I need to explicitly call commit
   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
   writing to disk?


   - Do We still need to use  Master/Slave setup to scale searching? If we
   have to use Master/Slave setup then do i need to issue hard-commit to make
   my changes visible to slaves?
   - If I were to use NRT with Master/Slave setup with soft commit then
   will the slave be able to see changes made on master with soft commit?

Any inputs are welcome.

Thanks

-Saroj

Re: Solr Cloud, Commits and Master/Slave configuration

Posted by eks dev <ek...@yahoo.co.uk>.
Thanks Mark,
Good, this is probably good enough to give it a try. My analyzers are
normally fast,  doing duplicate analysis  (at each replica) is
probably not going to cost a lot, if there is some decent "batching"

Can this be somehow controlled (depth of this buffer / time till flush
or some such). Which "events" trigger this flushing to replicas
(softCommit, commit, something new?)

What I found useful is to always think in terms of incremental (low
latency) and batch (high throughput) updates. I just then need some
knobs to tweak behavior of this update process.

I wold really like to move away from Master/Slave, Cloud makes a lot
of things way simpler for us users ... Will give it a try in a couple
of weeks

Later we can even think about putting replication at segment level for
"extremely expensive analysis, batch cases", or "initial cluster
seeding" as a replication option. But this is then just an
optimization.

Cheers,
eks


On Thu, Mar 1, 2012 at 5:24 AM, Mark Miller <ma...@gmail.com> wrote:
> We actually do currently batch updates - we are being somewhat loose when we say a document at a time. There is a buffer of updates per replica that gets flushed depending on the requests coming through and the buffer size.
>
> - Mark Miller
> lucidimagination.com
>
> On Feb 28, 2012, at 3:38 AM, eks dev wrote:
>
>> SolrCluod is going to be great, NRT feature is really huge step
>> forward, as well as central configuration, elasticity ...
>>
>> The only thing I do not yet understand is treatment of cases that were
>> traditionally covered by Master/Slave setup. Batch update
>>
>> If I get it right (?), updates to replicas are sent one by one,
>> meaning when one server receives update, it gets forwarded to all
>> replicas. This is great for reduced update latency case, but I do not
>> know how is it implemented if you hit it with "batch" update. This
>> would cause huge amount of update commands going to replicas. Not so
>> good for throughput.
>>
>> - Master slave does distribution at segment level, (no need to
>> replicate analysis, far less network traffic). Good for batch updates
>> - SolrCloud does par update command (low latency, but chatty and
>> Analysis step is done N_Servers times). Good for incremental updates
>>
>> Ideally, some sort of "batching" is going to be available in
>> SolrCloud, and some cont roll over it, e.g. forward batches of 1000
>> documents (basically keep update log slightly longer and forward it as
>> a batch update command). This would still cause duplicate analysis,
>> but would reduce network traffic.
>>
>> Please bare in mind, this is more of a question than a statement,  I
>> didn't look at the cloud code. It might be I am completely wrong here!
>>
>>
>>
>>
>>
>> On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson <er...@gmail.com> wrote:
>>> As I understand it (and I'm just getting into SolrCloud myself), you can
>>> essentially forget about master/slave stuff. If you're using NRT,
>>> the soft commit will make the docs visible, you don't ned to do a hard
>>> commit (unlike the master/slave days). Essentially, the update is sent
>>> to each shard leader and then fanned out into the replicas for that
>>> leader. All automatically. Leaders are elected automatically. ZooKeeper
>>> is used to keep the cluster information.
>>>
>>> Additionally, SolrCloud keeps a transaction log of the updates, and replays
>>> them if the indexing is interrupted, so you don't risk data loss the way
>>> you used to.
>>>
>>> There aren't really masters/slaves in the old sense any more, so
>>> you have to get out of that thought-mode (it's hard, I know).
>>>
>>> The code is under pretty active development, so any feedback is
>>> valuable....
>>>
>>> Best
>>> Erick
>>>
>>> On Mon, Feb 27, 2012 at 3:26 AM, roz dev <ro...@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> I am trying to understand features of Solr Cloud, regarding commits and
>>>> scaling.
>>>>
>>>>
>>>>   - If I am using Solr Cloud then do I need to explicitly call commit
>>>>   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
>>>>   writing to disk?
>>>>
>>>>
>>>>   - Do We still need to use  Master/Slave setup to scale searching? If we
>>>>   have to use Master/Slave setup then do i need to issue hard-commit to make
>>>>   my changes visible to slaves?
>>>>   - If I were to use NRT with Master/Slave setup with soft commit then
>>>>   will the slave be able to see changes made on master with soft commit?
>>>>
>>>> Any inputs are welcome.
>>>>
>>>> Thanks
>>>>
>>>> -Saroj
>
>
>
>
>
>
>
>
>
>
>
>

Re: Solr Cloud, Commits and Master/Slave configuration

Posted by Mark Miller <ma...@gmail.com>.
We actually do currently batch updates - we are being somewhat loose when we say a document at a time. There is a buffer of updates per replica that gets flushed depending on the requests coming through and the buffer size.

- Mark Miller
lucidimagination.com

On Feb 28, 2012, at 3:38 AM, eks dev wrote:

> SolrCluod is going to be great, NRT feature is really huge step
> forward, as well as central configuration, elasticity ...
> 
> The only thing I do not yet understand is treatment of cases that were
> traditionally covered by Master/Slave setup. Batch update
> 
> If I get it right (?), updates to replicas are sent one by one,
> meaning when one server receives update, it gets forwarded to all
> replicas. This is great for reduced update latency case, but I do not
> know how is it implemented if you hit it with "batch" update. This
> would cause huge amount of update commands going to replicas. Not so
> good for throughput.
> 
> - Master slave does distribution at segment level, (no need to
> replicate analysis, far less network traffic). Good for batch updates
> - SolrCloud does par update command (low latency, but chatty and
> Analysis step is done N_Servers times). Good for incremental updates
> 
> Ideally, some sort of "batching" is going to be available in
> SolrCloud, and some cont roll over it, e.g. forward batches of 1000
> documents (basically keep update log slightly longer and forward it as
> a batch update command). This would still cause duplicate analysis,
> but would reduce network traffic.
> 
> Please bare in mind, this is more of a question than a statement,  I
> didn't look at the cloud code. It might be I am completely wrong here!
> 
> 
> 
> 
> 
> On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson <er...@gmail.com> wrote:
>> As I understand it (and I'm just getting into SolrCloud myself), you can
>> essentially forget about master/slave stuff. If you're using NRT,
>> the soft commit will make the docs visible, you don't ned to do a hard
>> commit (unlike the master/slave days). Essentially, the update is sent
>> to each shard leader and then fanned out into the replicas for that
>> leader. All automatically. Leaders are elected automatically. ZooKeeper
>> is used to keep the cluster information.
>> 
>> Additionally, SolrCloud keeps a transaction log of the updates, and replays
>> them if the indexing is interrupted, so you don't risk data loss the way
>> you used to.
>> 
>> There aren't really masters/slaves in the old sense any more, so
>> you have to get out of that thought-mode (it's hard, I know).
>> 
>> The code is under pretty active development, so any feedback is
>> valuable....
>> 
>> Best
>> Erick
>> 
>> On Mon, Feb 27, 2012 at 3:26 AM, roz dev <ro...@gmail.com> wrote:
>>> Hi All,
>>> 
>>> I am trying to understand features of Solr Cloud, regarding commits and
>>> scaling.
>>> 
>>> 
>>>   - If I am using Solr Cloud then do I need to explicitly call commit
>>>   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
>>>   writing to disk?
>>> 
>>> 
>>>   - Do We still need to use  Master/Slave setup to scale searching? If we
>>>   have to use Master/Slave setup then do i need to issue hard-commit to make
>>>   my changes visible to slaves?
>>>   - If I were to use NRT with Master/Slave setup with soft commit then
>>>   will the slave be able to see changes made on master with soft commit?
>>> 
>>> Any inputs are welcome.
>>> 
>>> Thanks
>>> 
>>> -Saroj













Re: Solr Cloud, Commits and Master/Slave configuration

Posted by eks dev <ek...@yahoo.co.uk>.
SolrCluod is going to be great, NRT feature is really huge step
forward, as well as central configuration, elasticity ...

The only thing I do not yet understand is treatment of cases that were
traditionally covered by Master/Slave setup. Batch update

If I get it right (?), updates to replicas are sent one by one,
meaning when one server receives update, it gets forwarded to all
replicas. This is great for reduced update latency case, but I do not
know how is it implemented if you hit it with "batch" update. This
would cause huge amount of update commands going to replicas. Not so
good for throughput.

- Master slave does distribution at segment level, (no need to
replicate analysis, far less network traffic). Good for batch updates
- SolrCloud does par update command (low latency, but chatty and
Analysis step is done N_Servers times). Good for incremental updates

Ideally, some sort of "batching" is going to be available in
SolrCloud, and some cont roll over it, e.g. forward batches of 1000
documents (basically keep update log slightly longer and forward it as
a batch update command). This would still cause duplicate analysis,
but would reduce network traffic.

Please bare in mind, this is more of a question than a statement,  I
didn't look at the cloud code. It might be I am completely wrong here!





On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson <er...@gmail.com> wrote:
> As I understand it (and I'm just getting into SolrCloud myself), you can
> essentially forget about master/slave stuff. If you're using NRT,
> the soft commit will make the docs visible, you don't ned to do a hard
> commit (unlike the master/slave days). Essentially, the update is sent
> to each shard leader and then fanned out into the replicas for that
> leader. All automatically. Leaders are elected automatically. ZooKeeper
> is used to keep the cluster information.
>
> Additionally, SolrCloud keeps a transaction log of the updates, and replays
> them if the indexing is interrupted, so you don't risk data loss the way
> you used to.
>
> There aren't really masters/slaves in the old sense any more, so
> you have to get out of that thought-mode (it's hard, I know).
>
> The code is under pretty active development, so any feedback is
> valuable....
>
> Best
> Erick
>
> On Mon, Feb 27, 2012 at 3:26 AM, roz dev <ro...@gmail.com> wrote:
>> Hi All,
>>
>> I am trying to understand features of Solr Cloud, regarding commits and
>> scaling.
>>
>>
>>   - If I am using Solr Cloud then do I need to explicitly call commit
>>   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
>>   writing to disk?
>>
>>
>>   - Do We still need to use  Master/Slave setup to scale searching? If we
>>   have to use Master/Slave setup then do i need to issue hard-commit to make
>>   my changes visible to slaves?
>>   - If I were to use NRT with Master/Slave setup with soft commit then
>>   will the slave be able to see changes made on master with soft commit?
>>
>> Any inputs are welcome.
>>
>> Thanks
>>
>> -Saroj

Re: Solr Cloud, Commits and Master/Slave configuration

Posted by Erick Erickson <er...@gmail.com>.
As I understand it (and I'm just getting into SolrCloud myself), you can
essentially forget about master/slave stuff. If you're using NRT,
the soft commit will make the docs visible, you don't ned to do a hard
commit (unlike the master/slave days). Essentially, the update is sent
to each shard leader and then fanned out into the replicas for that
leader. All automatically. Leaders are elected automatically. ZooKeeper
is used to keep the cluster information.

Additionally, SolrCloud keeps a transaction log of the updates, and replays
them if the indexing is interrupted, so you don't risk data loss the way
you used to.

There aren't really masters/slaves in the old sense any more, so
you have to get out of that thought-mode (it's hard, I know).

The code is under pretty active development, so any feedback is
valuable....

Best
Erick

On Mon, Feb 27, 2012 at 3:26 AM, roz dev <ro...@gmail.com> wrote:
> Hi All,
>
> I am trying to understand features of Solr Cloud, regarding commits and
> scaling.
>
>
>   - If I am using Solr Cloud then do I need to explicitly call commit
>   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
>   writing to disk?
>
>
>   - Do We still need to use  Master/Slave setup to scale searching? If we
>   have to use Master/Slave setup then do i need to issue hard-commit to make
>   my changes visible to slaves?
>   - If I were to use NRT with Master/Slave setup with soft commit then
>   will the slave be able to see changes made on master with soft commit?
>
> Any inputs are welcome.
>
> Thanks
>
> -Saroj