You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Francis Rhys-Jones <fr...@guardian.co.uk> on 2010/12/22 18:10:14 UTC

Configuration option for disableReplication

Hi,

I am looking into using a multi core configuration to allow us to
fully rebuild our index while still applying updates.

I have two cores main-core and rebuild-core. I push the whole dataset
into the rebuild core, during which time I can happily keep pushing
updates into the main-core. Once the rebuild is complete I swap the
cores and delete *:* from the rebuild core.

This works fine however there are a couple of edge cases:

On server restart solr needs to remember which core has been swapped
in to be the main core, this can be solved by adding the
persistent=true attribute to the solr config, however this does
require the solr.xml to be writeable.

While deploying a new version of our application we overwrite the
solr.xml, as the new version could potentially have legitimate changes
to the solr.xml that need to be rolled out, again leaving the cores
out of sync.

My proposed solution is to have the indexing process do some sanity
checking at the start of each run, and swap in the correct core if
necessary.

This works however there is the potential for the slaves to start
replicating the empty index before the correct index is swapped in.

To get round this problem I would like to have replication disabled on start up.

Removing  replicateAfter=startup has this affect but it would be more
future proof to be able to specify a default for the
replicationEnabled field (see SOLR-1175) in the ReplcationHandler,
stopping replication until I explicitly turn it on.

The change looks fairly simple.

What do you think?

Francis
Please consider the environment before printing this email.
------------------------------------------------------------------
Visit guardian.co.uk - newspaper website of the year
www.guardian.co.uk  www.observer.co.uk

To save up to 33% when you subscribe to the Guardian and the Observer
visit http://www.guardian.co.uk/subscriber

---------------------------------------------------------------------

This e-mail and all attachments are confidential and may also
be privileged. If you are not the named recipient, please notify
the sender and delete the e-mail and all attachments immediately.
Do not disclose the contents to another person. You may not use
the information for any purpose, or store, or copy, it in any way.

Guardian News & Media Limited is not liable for any computer
viruses or other material transmitted with or as part of this
e-mail. You should employ virus checking software.

Guardian News & Media Limited

A member of Guardian Media Group plc
Registered Office
PO Box 68164
Kings Place
90 York Way
London
N1P 2AP

Registered in England Number 908396


Re: Configuration option for disableReplication

Posted by Upayavira <uv...@odoko.co.uk>.
Having played with it, I can see that it would be extremely useful to be
able to disable replication in the solrconfig.xml, and then enable it
with a URL.

So, as to your patch, I'd say yes, submit it. But do try to make it
backwards compatible. It'll make it much more likely to get accepted.

Upayavira

On Thu, 23 Dec 2010 12:12 +0000, "Francis Rhys-Jones"
<fr...@guardian.co.uk> wrote:
> Hi,
> 
> Were running a cloud based cluster of servers and its not that easy to
> get a list of the current slaves. Since my problem is only around the
> restart/redeployment of the master it seems an unnecessary
> complication to have to start interacting with slaves as part of the
> scripts that do this.
> 
> As you say there seems to be a proliferation of features you can
> enable and disable for the replication handler. Setting enabled=false
> for the master turns off all the features relating the the instance
> being a master. This is slightly different to the calling the
> 'disablereplication' command, which simply causes the 'indexversion'
> command to return 0 which effectively stops the slaves from knowing if
> there is a new version and hence trying to replicate it.
> 
> Im not entirely clear whether this distinction is actually a useful
> one, combining them would be a fairly reasonable re factoring of the
> update handler, and would probably have an affect on backwards
> compatibility.
> 
> Having the replicateAfter parameter set to just 'commit' (ie not on
> start up) has a similar affect to the 'disablereplication' command
> until you do the first commit after startup. So this is a workable
> solution for me, as the the process that pushes updates and commits to
> the index can also check and swap the cores before it does any work.
> 
> However it feels like a bit of a tenuous way of disabling replication,
> particularly as there is an explicit mechanism for doing so, its just
> not configurable on startup.
> 
> I have a patch, I was looking for a bit of feedback as to whether I
> should submit it.
> 
> Thanks,
> 
> Francis
> 
> 
> 
> On 22 December 2010 21:30, Upayavira <uv...@odoko.co.uk> wrote:
> > I've just done a bit of playing here, because I've spent a lot of time
> > reading the SolrReplication wiki page[1], and have often wondered how
> > some features interact.
> >
> > Unfortunately, if you specify <str name="enable">false</str> in your
> > replication request handler for your master, you cannot re-enable it
> > with a call to /solr/replication?command=enablereplication
> >
> > Therefore, it would seem your best bet is to call
> > /solr/replication?command=disablepolling on all of your slaves prior to
> > upgrading. Then, when you're sure everything is right, call
> > /solr/replication?command=enablepolling on each slave, and you should be
> > good to go.
> >
> > I tried this, watching the request log on my master, and the incoming
> > replication requests did actually stop due to the disablepolling
> > command, so you should be fine with this approach.
> >
> > Does this get you to where you want to be?
> >
> > Upayavira
> >
> > On Wed, 22 Dec 2010 17:10 +0000, "Francis Rhys-Jones"
> > <fr...@guardian.co.uk> wrote:
> >> Hi,
> >>
> >> I am looking into using a multi core configuration to allow us to
> >> fully rebuild our index while still applying updates.
> >>
> >> I have two cores main-core and rebuild-core. I push the whole dataset
> >> into the rebuild core, during which time I can happily keep pushing
> >> updates into the main-core. Once the rebuild is complete I swap the
> >> cores and delete *:* from the rebuild core.
> >>
> >> This works fine however there are a couple of edge cases:
> >>
> >> On server restart solr needs to remember which core has been swapped
> >> in to be the main core, this can be solved by adding the
> >> persistent=true attribute to the solr config, however this does
> >> require the solr.xml to be writeable.
> >>
> >> While deploying a new version of our application we overwrite the
> >> solr.xml, as the new version could potentially have legitimate changes
> >> to the solr.xml that need to be rolled out, again leaving the cores
> >> out of sync.
> >>
> >> My proposed solution is to have the indexing process do some sanity
> >> checking at the start of each run, and swap in the correct core if
> >> necessary.
> >>
> >> This works however there is the potential for the slaves to start
> >> replicating the empty index before the correct index is swapped in.
> >>
> >> To get round this problem I would like to have replication disabled on
> >> start up.
> >>
> >> Removing  replicateAfter=startup has this affect but it would be more
> >> future proof to be able to specify a default for the
> >> replicationEnabled field (see SOLR-1175) in the ReplcationHandler,
> >> stopping replication until I explicitly turn it on.
> >>
> >> The change looks fairly simple.
> > ---
> > Enterprise Search Consultant at Sourcesense UK,
> > Making Sense of Open Source
> >
> >
> Please consider the environment before printing this email.
> ------------------------------------------------------------------
> Visit guardian.co.uk - newspaper website of the year
> www.guardian.co.uk  www.observer.co.uk
> 
> To save up to 33% when you subscribe to the Guardian and the Observer
> visit http://www.guardian.co.uk/subscriber
> 
> ---------------------------------------------------------------------
> 
> This e-mail and all attachments are confidential and may also
> be privileged. If you are not the named recipient, please notify
> the sender and delete the e-mail and all attachments immediately.
> Do not disclose the contents to another person. You may not use
> the information for any purpose, or store, or copy, it in any way.
> 
> Guardian News & Media Limited is not liable for any computer
> viruses or other material transmitted with or as part of this
> e-mail. You should employ virus checking software.
> 
> Guardian News & Media Limited
> 
> A member of Guardian Media Group plc
> Registered Office
> PO Box 68164
> Kings Place
> 90 York Way
> London
> N1P 2AP
> 
> Registered in England Number 908396
> 
> 

Re: Configuration option for disableReplication

Posted by Francis Rhys-Jones <fr...@guardian.co.uk>.
Hi,

Were running a cloud based cluster of servers and its not that easy to
get a list of the current slaves. Since my problem is only around the
restart/redeployment of the master it seems an unnecessary
complication to have to start interacting with slaves as part of the
scripts that do this.

As you say there seems to be a proliferation of features you can
enable and disable for the replication handler. Setting enabled=false
for the master turns off all the features relating the the instance
being a master. This is slightly different to the calling the
'disablereplication' command, which simply causes the 'indexversion'
command to return 0 which effectively stops the slaves from knowing if
there is a new version and hence trying to replicate it.

Im not entirely clear whether this distinction is actually a useful
one, combining them would be a fairly reasonable re factoring of the
update handler, and would probably have an affect on backwards
compatibility.

Having the replicateAfter parameter set to just 'commit' (ie not on
start up) has a similar affect to the 'disablereplication' command
until you do the first commit after startup. So this is a workable
solution for me, as the the process that pushes updates and commits to
the index can also check and swap the cores before it does any work.

However it feels like a bit of a tenuous way of disabling replication,
particularly as there is an explicit mechanism for doing so, its just
not configurable on startup.

I have a patch, I was looking for a bit of feedback as to whether I
should submit it.

Thanks,

Francis



On 22 December 2010 21:30, Upayavira <uv...@odoko.co.uk> wrote:
> I've just done a bit of playing here, because I've spent a lot of time
> reading the SolrReplication wiki page[1], and have often wondered how
> some features interact.
>
> Unfortunately, if you specify <str name="enable">false</str> in your
> replication request handler for your master, you cannot re-enable it
> with a call to /solr/replication?command=enablereplication
>
> Therefore, it would seem your best bet is to call
> /solr/replication?command=disablepolling on all of your slaves prior to
> upgrading. Then, when you're sure everything is right, call
> /solr/replication?command=enablepolling on each slave, and you should be
> good to go.
>
> I tried this, watching the request log on my master, and the incoming
> replication requests did actually stop due to the disablepolling
> command, so you should be fine with this approach.
>
> Does this get you to where you want to be?
>
> Upayavira
>
> On Wed, 22 Dec 2010 17:10 +0000, "Francis Rhys-Jones"
> <fr...@guardian.co.uk> wrote:
>> Hi,
>>
>> I am looking into using a multi core configuration to allow us to
>> fully rebuild our index while still applying updates.
>>
>> I have two cores main-core and rebuild-core. I push the whole dataset
>> into the rebuild core, during which time I can happily keep pushing
>> updates into the main-core. Once the rebuild is complete I swap the
>> cores and delete *:* from the rebuild core.
>>
>> This works fine however there are a couple of edge cases:
>>
>> On server restart solr needs to remember which core has been swapped
>> in to be the main core, this can be solved by adding the
>> persistent=true attribute to the solr config, however this does
>> require the solr.xml to be writeable.
>>
>> While deploying a new version of our application we overwrite the
>> solr.xml, as the new version could potentially have legitimate changes
>> to the solr.xml that need to be rolled out, again leaving the cores
>> out of sync.
>>
>> My proposed solution is to have the indexing process do some sanity
>> checking at the start of each run, and swap in the correct core if
>> necessary.
>>
>> This works however there is the potential for the slaves to start
>> replicating the empty index before the correct index is swapped in.
>>
>> To get round this problem I would like to have replication disabled on
>> start up.
>>
>> Removing  replicateAfter=startup has this affect but it would be more
>> future proof to be able to specify a default for the
>> replicationEnabled field (see SOLR-1175) in the ReplcationHandler,
>> stopping replication until I explicitly turn it on.
>>
>> The change looks fairly simple.
> ---
> Enterprise Search Consultant at Sourcesense UK,
> Making Sense of Open Source
>
>
Please consider the environment before printing this email.
------------------------------------------------------------------
Visit guardian.co.uk - newspaper website of the year
www.guardian.co.uk  www.observer.co.uk

To save up to 33% when you subscribe to the Guardian and the Observer
visit http://www.guardian.co.uk/subscriber

---------------------------------------------------------------------

This e-mail and all attachments are confidential and may also
be privileged. If you are not the named recipient, please notify
the sender and delete the e-mail and all attachments immediately.
Do not disclose the contents to another person. You may not use
the information for any purpose, or store, or copy, it in any way.

Guardian News & Media Limited is not liable for any computer
viruses or other material transmitted with or as part of this
e-mail. You should employ virus checking software.

Guardian News & Media Limited

A member of Guardian Media Group plc
Registered Office
PO Box 68164
Kings Place
90 York Way
London
N1P 2AP

Registered in England Number 908396


Re: Configuration option for disableReplication

Posted by Upayavira <uv...@odoko.co.uk>.
I've just done a bit of playing here, because I've spent a lot of time
reading the SolrReplication wiki page[1], and have often wondered how
some features interact.

Unfortunately, if you specify <str name="enable">false</str> in your
replication request handler for your master, you cannot re-enable it
with a call to /solr/replication?command=enablereplication

Therefore, it would seem your best bet is to call
/solr/replication?command=disablepolling on all of your slaves prior to
upgrading. Then, when you're sure everything is right, call
/solr/replication?command=enablepolling on each slave, and you should be
good to go.

I tried this, watching the request log on my master, and the incoming
replication requests did actually stop due to the disablepolling
command, so you should be fine with this approach.

Does this get you to where you want to be?

Upayavira

On Wed, 22 Dec 2010 17:10 +0000, "Francis Rhys-Jones"
<fr...@guardian.co.uk> wrote:
> Hi,
> 
> I am looking into using a multi core configuration to allow us to
> fully rebuild our index while still applying updates.
> 
> I have two cores main-core and rebuild-core. I push the whole dataset
> into the rebuild core, during which time I can happily keep pushing
> updates into the main-core. Once the rebuild is complete I swap the
> cores and delete *:* from the rebuild core.
> 
> This works fine however there are a couple of edge cases:
> 
> On server restart solr needs to remember which core has been swapped
> in to be the main core, this can be solved by adding the
> persistent=true attribute to the solr config, however this does
> require the solr.xml to be writeable.
> 
> While deploying a new version of our application we overwrite the
> solr.xml, as the new version could potentially have legitimate changes
> to the solr.xml that need to be rolled out, again leaving the cores
> out of sync.
> 
> My proposed solution is to have the indexing process do some sanity
> checking at the start of each run, and swap in the correct core if
> necessary.
> 
> This works however there is the potential for the slaves to start
> replicating the empty index before the correct index is swapped in.
> 
> To get round this problem I would like to have replication disabled on
> start up.
> 
> Removing  replicateAfter=startup has this affect but it would be more
> future proof to be able to specify a default for the
> replicationEnabled field (see SOLR-1175) in the ReplcationHandler,
> stopping replication until I explicitly turn it on.
> 
> The change looks fairly simple.
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source