You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2012/03/01 05:24:21 UTC

Re: Solr Cloud, Commits and Master/Slave configuration

We actually do currently batch updates - we are being somewhat loose when we say a document at a time. There is a buffer of updates per replica that gets flushed depending on the requests coming through and the buffer size.

- Mark Miller
lucidimagination.com

On Feb 28, 2012, at 3:38 AM, eks dev wrote:

> SolrCluod is going to be great, NRT feature is really huge step
> forward, as well as central configuration, elasticity ...
> 
> The only thing I do not yet understand is treatment of cases that were
> traditionally covered by Master/Slave setup. Batch update
> 
> If I get it right (?), updates to replicas are sent one by one,
> meaning when one server receives update, it gets forwarded to all
> replicas. This is great for reduced update latency case, but I do not
> know how is it implemented if you hit it with "batch" update. This
> would cause huge amount of update commands going to replicas. Not so
> good for throughput.
> 
> - Master slave does distribution at segment level, (no need to
> replicate analysis, far less network traffic). Good for batch updates
> - SolrCloud does par update command (low latency, but chatty and
> Analysis step is done N_Servers times). Good for incremental updates
> 
> Ideally, some sort of "batching" is going to be available in
> SolrCloud, and some cont roll over it, e.g. forward batches of 1000
> documents (basically keep update log slightly longer and forward it as
> a batch update command). This would still cause duplicate analysis,
> but would reduce network traffic.
> 
> Please bare in mind, this is more of a question than a statement,  I
> didn't look at the cloud code. It might be I am completely wrong here!
> 
> 
> 
> 
> 
> On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson <er...@gmail.com> wrote:
>> As I understand it (and I'm just getting into SolrCloud myself), you can
>> essentially forget about master/slave stuff. If you're using NRT,
>> the soft commit will make the docs visible, you don't ned to do a hard
>> commit (unlike the master/slave days). Essentially, the update is sent
>> to each shard leader and then fanned out into the replicas for that
>> leader. All automatically. Leaders are elected automatically. ZooKeeper
>> is used to keep the cluster information.
>> 
>> Additionally, SolrCloud keeps a transaction log of the updates, and replays
>> them if the indexing is interrupted, so you don't risk data loss the way
>> you used to.
>> 
>> There aren't really masters/slaves in the old sense any more, so
>> you have to get out of that thought-mode (it's hard, I know).
>> 
>> The code is under pretty active development, so any feedback is
>> valuable....
>> 
>> Best
>> Erick
>> 
>> On Mon, Feb 27, 2012 at 3:26 AM, roz dev <ro...@gmail.com> wrote:
>>> Hi All,
>>> 
>>> I am trying to understand features of Solr Cloud, regarding commits and
>>> scaling.
>>> 
>>> 
>>>   - If I am using Solr Cloud then do I need to explicitly call commit
>>>   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
>>>   writing to disk?
>>> 
>>> 
>>>   - Do We still need to use  Master/Slave setup to scale searching? If we
>>>   have to use Master/Slave setup then do i need to issue hard-commit to make
>>>   my changes visible to slaves?
>>>   - If I were to use NRT with Master/Slave setup with soft commit then
>>>   will the slave be able to see changes made on master with soft commit?
>>> 
>>> Any inputs are welcome.
>>> 
>>> Thanks
>>> 
>>> -Saroj













Re: Solr Cloud, Commits and Master/Slave configuration

Posted by eks dev <ek...@yahoo.co.uk>.
Thanks Mark,
Good, this is probably good enough to give it a try. My analyzers are
normally fast,  doing duplicate analysis  (at each replica) is
probably not going to cost a lot, if there is some decent "batching"

Can this be somehow controlled (depth of this buffer / time till flush
or some such). Which "events" trigger this flushing to replicas
(softCommit, commit, something new?)

What I found useful is to always think in terms of incremental (low
latency) and batch (high throughput) updates. I just then need some
knobs to tweak behavior of this update process.

I wold really like to move away from Master/Slave, Cloud makes a lot
of things way simpler for us users ... Will give it a try in a couple
of weeks

Later we can even think about putting replication at segment level for
"extremely expensive analysis, batch cases", or "initial cluster
seeding" as a replication option. But this is then just an
optimization.

Cheers,
eks


On Thu, Mar 1, 2012 at 5:24 AM, Mark Miller <ma...@gmail.com> wrote:
> We actually do currently batch updates - we are being somewhat loose when we say a document at a time. There is a buffer of updates per replica that gets flushed depending on the requests coming through and the buffer size.
>
> - Mark Miller
> lucidimagination.com
>
> On Feb 28, 2012, at 3:38 AM, eks dev wrote:
>
>> SolrCluod is going to be great, NRT feature is really huge step
>> forward, as well as central configuration, elasticity ...
>>
>> The only thing I do not yet understand is treatment of cases that were
>> traditionally covered by Master/Slave setup. Batch update
>>
>> If I get it right (?), updates to replicas are sent one by one,
>> meaning when one server receives update, it gets forwarded to all
>> replicas. This is great for reduced update latency case, but I do not
>> know how is it implemented if you hit it with "batch" update. This
>> would cause huge amount of update commands going to replicas. Not so
>> good for throughput.
>>
>> - Master slave does distribution at segment level, (no need to
>> replicate analysis, far less network traffic). Good for batch updates
>> - SolrCloud does par update command (low latency, but chatty and
>> Analysis step is done N_Servers times). Good for incremental updates
>>
>> Ideally, some sort of "batching" is going to be available in
>> SolrCloud, and some cont roll over it, e.g. forward batches of 1000
>> documents (basically keep update log slightly longer and forward it as
>> a batch update command). This would still cause duplicate analysis,
>> but would reduce network traffic.
>>
>> Please bare in mind, this is more of a question than a statement,  I
>> didn't look at the cloud code. It might be I am completely wrong here!
>>
>>
>>
>>
>>
>> On Tue, Feb 28, 2012 at 4:01 AM, Erick Erickson <er...@gmail.com> wrote:
>>> As I understand it (and I'm just getting into SolrCloud myself), you can
>>> essentially forget about master/slave stuff. If you're using NRT,
>>> the soft commit will make the docs visible, you don't ned to do a hard
>>> commit (unlike the master/slave days). Essentially, the update is sent
>>> to each shard leader and then fanned out into the replicas for that
>>> leader. All automatically. Leaders are elected automatically. ZooKeeper
>>> is used to keep the cluster information.
>>>
>>> Additionally, SolrCloud keeps a transaction log of the updates, and replays
>>> them if the indexing is interrupted, so you don't risk data loss the way
>>> you used to.
>>>
>>> There aren't really masters/slaves in the old sense any more, so
>>> you have to get out of that thought-mode (it's hard, I know).
>>>
>>> The code is under pretty active development, so any feedback is
>>> valuable....
>>>
>>> Best
>>> Erick
>>>
>>> On Mon, Feb 27, 2012 at 3:26 AM, roz dev <ro...@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> I am trying to understand features of Solr Cloud, regarding commits and
>>>> scaling.
>>>>
>>>>
>>>>   - If I am using Solr Cloud then do I need to explicitly call commit
>>>>   (hard-commit)? Or, a soft commit is okay and Solr Cloud will do the job of
>>>>   writing to disk?
>>>>
>>>>
>>>>   - Do We still need to use  Master/Slave setup to scale searching? If we
>>>>   have to use Master/Slave setup then do i need to issue hard-commit to make
>>>>   my changes visible to slaves?
>>>>   - If I were to use NRT with Master/Slave setup with soft commit then
>>>>   will the slave be able to see changes made on master with soft commit?
>>>>
>>>> Any inputs are welcome.
>>>>
>>>> Thanks
>>>>
>>>> -Saroj
>
>
>
>
>
>
>
>
>
>
>
>