You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by KaktuChakarabati <ji...@gmail.com> on 2009/08/14 05:09:08 UTC

Solr 1.4 Replication scheme

Hello,
I've recently switched over to solr1.4 (recent nightly build) and have been
using the new replication.
Some questions come to mind:

In the old replication, I could snappull with multiple slaves asynchronously
but perform the snapinstall on each at the same time (+- epsilon seconds),
so that way production load balanced query serving will always be
consistent.

With the new system it seems that i have no control over syncing them, but
rather it polls every few minutes and then decides the next cycle based on
last time it *finished* updating, so in any case I lose control over the
synchronization of snap installation across multiple slaves. 

Also, I noticed the default poll interval is 60 seconds. It would seem that
for such a rapid interval, what i mentioned above is a non issue, however i
am not clear how this works vis-a-vis the new searcher warmup? for a
considerable index size (20Million docs+) the warmup itself is an expensive
and somewhat lengthy process and if a new searcher opens and warms up every
minute, I am not at all sure i'll be able to serve queries with reasonable
QTimes.

Anyone else came across these issues? any advise/comment will be
appreciated!

Thanks,
-Chak

-- 
View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24965590.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 Replication scheme

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Fri, Aug 14, 2009 at 11:53 AM, Jibo John<ji...@mac.com> wrote:
> Slightly off topic.... one question on the index file transfer mechanism
> used in the new 1.4 Replication scheme.
> Is my understanding correct that the transfer is over http?  (vs. rsync in
> the script-based snappuller)

Yes, that's correct.

-Yonik
http://www.lucidimagination.com

Re: Solr 1.4 Replication scheme

Posted by Jibo John <ji...@mac.com>.

Slightly off topic.... one question on the index file transfer  
mechanism used in the new 1.4 Replication scheme.
Is my understanding correct that the transfer is over http?  (vs.  
rsync in the script-based snappuller)

Thanks,
-Jibo


On Aug 14, 2009, at 6:34 AM, Yonik Seeley wrote:

> Longer term, it might be nice to enable clients to specify what
> version of the index they were searching against.  This could be used
> to prevent consistency issues across different slaves, even if they
> commit at different times.  It could also be used in distributed
> search to make sure the index didn't change between phases.
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> 2009/8/14 Noble Paul നോബിള്‍  नोब्ळ्  
> <no...@corp.aol.com>:
>> On Fri, Aug 14, 2009 at 2:28 PM,  
>> KaktuChakarabati<ji...@gmail.com> wrote:
>>>
>>> Hey Noble,
>>> you are right in that this will solve the problem, however it  
>>> implicitly
>>> assumes that commits to the master are infrequent enough ( so that  
>>> most
>>> polling operations yield no update and only every few polls lead  
>>> to an
>>> actual commit. )
>>> This is a relatively safe assumption in most cases, but one that  
>>> couples the
>>> master update policy with the performance of the slaves - if the  
>>> master gets
>>> updated (and committed to) frequently, slaves might face a commit  
>>> on every
>>> 1-2 poll's, much more than is feasible given new searcher warmup  
>>> times..
>>> In effect what this comes down to it seems is that i must make the  
>>> master
>>> commit frequency the same as i'd want the slaves to use - and this  
>>> is
>>> markedly different than previous behaviour with which i could have  
>>> the
>>> master get updated(+committed to) at one rate and slaves  
>>> committing those
>>> updates at a different rate.
>> I see , the argument. But , isn't it better to keep both the mster  
>> and
>> slave as consistent as possible? There is no use in committing in
>> master, if you do not plan to search on those docs. So the best thing
>> to do is do a commit only as frequently as you wish to commit in a
>> slave.
>>
>> On a different track, if we can have an option of disabling commit
>> after replication, is it worth it? So the user can trigger a commit
>> explicitly
>>
>>>
>>>
>>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>>
>>>> usually the pollInterval is kept to a small value like 10secs.  
>>>> there
>>>> is no harm in polling more frequently. This can ensure that the
>>>> replication happens at almost same time
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabati<jimmoefoe@gmail.com 
>>>> >
>>>> wrote:
>>>>>
>>>>> Hey Shalin,
>>>>> thanks for your prompt reply.
>>>>> To clarity:
>>>>> With the old script-based replication, I would snappull every x  
>>>>> minutes
>>>>> (say, on the order of 5 minutes).
>>>>> Assuming no index optimize occured ( I optimize 1-2 times a day  
>>>>> so we can
>>>>> disregard it for the sake of argument), the snappull would take  
>>>>> a few
>>>>> seconds to run on each iteration.
>>>>> I then have a crontab on all slaves that runs snapinstall on a  
>>>>> fixed
>>>>> time,
>>>>> lets say every 15 minutes from start of a round hour, inclusive.  
>>>>> (slave
>>>>> machine times are synced e.g via ntp) so that essentially all  
>>>>> slaves will
>>>>> begin a snapinstall exactly at the same time - assuming uniform  
>>>>> load and
>>>>> the
>>>>> fact they all have at this point in time the same snapshot since I
>>>>> snappull
>>>>> frequently - this leads to a fairly synchronized replication  
>>>>> across the
>>>>> board.
>>>>>
>>>>> With the new replication however, it seems that by binding the  
>>>>> pulling
>>>>> and
>>>>> installing as well specifying the timing in delta's only (as  
>>>>> opposed to
>>>>> "absolute-time" based like in crontab) we've essentially made it
>>>>> impossible
>>>>> to effectively keep multiple slaves up to date and synchronized;  
>>>>> e.g if
>>>>> we
>>>>> set poll interval to 15 minutes, a slight offset in the startup  
>>>>> times of
>>>>> the
>>>>> slaves (that can very much be the case for arbitrary resets/ 
>>>>> maintenance
>>>>> operations) can lead to deviations in snappull(+install) times.  
>>>>> this in
>>>>> turn
>>>>> is further made worse by the fact that the pollInterval is then  
>>>>> computed
>>>>> based on the offset of when the last commit *finished* - and  
>>>>> this number
>>>>> seems to have a higher variance, e.g due to warmup which might be
>>>>> different
>>>>> across machines based on the queries they've handled previously.
>>>>>
>>>>> To summarize, It seems to me like it might be beneficial to  
>>>>> introduce a
>>>>> second parameter that acts more like a crontab time-based  
>>>>> tableau, in so
>>>>> far
>>>>> that it can enable a user to specify when an actual commit  
>>>>> should occur -
>>>>> so
>>>>> then we can have the pollInterval set to a low value (e.g 60  
>>>>> seconds) but
>>>>> then specify to only perform a commit on the 0,15,30,45-minutes  
>>>>> of every
>>>>> hour. this makes the commit times on the slaves fairly  
>>>>> deterministic.
>>>>>
>>>>> Does this make sense or am i missing something with current in- 
>>>>> process
>>>>> replication?
>>>>>
>>>>> Thanks,
>>>>> -Chak
>>>>>
>>>>>
>>>>> Shalin Shekhar Mangar wrote:
>>>>>>
>>>>>> On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
>>>>>> <ji...@gmail.com>wrote:
>>>>>>
>>>>>>>
>>>>>>> In the old replication, I could snappull with multiple slaves
>>>>>>> asynchronously
>>>>>>> but perform the snapinstall on each at the same time (+- epsilon
>>>>>>> seconds),
>>>>>>> so that way production load balanced query serving will always  
>>>>>>> be
>>>>>>> consistent.
>>>>>>>
>>>>>>> With the new system it seems that i have no control over  
>>>>>>> syncing them,
>>>>>>> but
>>>>>>> rather it polls every few minutes and then decides the next  
>>>>>>> cycle based
>>>>>>> on
>>>>>>> last time it *finished* updating, so in any case I lose  
>>>>>>> control over
>>>>>>> the
>>>>>>> synchronization of snap installation across multiple slaves.
>>>>>>>
>>>>>>
>>>>>> That is true. How did you synchronize them with the script based
>>>>>> solution?
>>>>>> Assuming network bandwidth is equally distributed and all  
>>>>>> slaves are
>>>>>> equal
>>>>>> in hardware/configuration, the time difference between new  
>>>>>> searcher
>>>>>> registration on any slave should not be more then pollInterval,  
>>>>>> no?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Also, I noticed the default poll interval is 60 seconds. It  
>>>>>>> would seem
>>>>>>> that
>>>>>>> for such a rapid interval, what i mentioned above is a non  
>>>>>>> issue,
>>>>>>> however
>>>>>>> i
>>>>>>> am not clear how this works vis-a-vis the new searcher warmup?  
>>>>>>> for a
>>>>>>> considerable index size (20Million docs+) the warmup itself is  
>>>>>>> an
>>>>>>> expensive
>>>>>>> and somewhat lengthy process and if a new searcher opens and  
>>>>>>> warms up
>>>>>>> every
>>>>>>> minute, I am not at all sure i'll be able to serve queries with
>>>>>>> reasonable
>>>>>>> QTimes.
>>>>>>>
>>>>>>
>>>>>> If the pollInterval is 60 seconds, it does not mean that a new  
>>>>>> index is
>>>>>> fetched every 60 seconds. A new index is downloaded and  
>>>>>> installed on the
>>>>>> slave only if a commit happened on the master (i.e. the index was
>>>>>> actually
>>>>>> changed on the master).
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Shalin Shekhar Mangar.
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -----------------------------------------------------
>>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>>
>>>>
>>>
>>> --
>>> View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968460.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>

Re: Solr 1.4 Replication scheme

Posted by Yonik Seeley <yo...@lucidimagination.com>.

On Fri, Aug 14, 2009 at 1:48 PM, Jason
Rutherglen<ja...@gmail.com> wrote:
> This would be good! Especially for NRT where this problem is
> somewhat harder. I think we may need to look at caching readers
> per corresponding http session.

For something like distributed search I was thinking of a simple
reservation mechanism... let the client specify how long to hold open
that version of the index (perhaps still have a max number of open
versions to prevent an errant client from blowing things up).

-Yonik
http://www.lucidimagination.com


 The pitfall is expiring them
> before running out of RAM.
>
> On Fri, Aug 14, 2009 at 6:34 AM, Yonik Seeley<yo...@lucidimagination.com> wrote:
>> Longer term, it might be nice to enable clients to specify what
>> version of the index they were searching against.  This could be used
>> to prevent consistency issues across different slaves, even if they
>> commit at different times.  It could also be used in distributed
>> search to make sure the index didn't change between phases.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>>
>> 2009/8/14 Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com>:
>>> On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabati<ji...@gmail.com> wrote:
>>>>
>>>> Hey Noble,
>>>> you are right in that this will solve the problem, however it implicitly
>>>> assumes that commits to the master are infrequent enough ( so that most
>>>> polling operations yield no update and only every few polls lead to an
>>>> actual commit. )
>>>> This is a relatively safe assumption in most cases, but one that couples the
>>>> master update policy with the performance of the slaves - if the master gets
>>>> updated (and committed to) frequently, slaves might face a commit on every
>>>> 1-2 poll's, much more than is feasible given new searcher warmup times..
>>>> In effect what this comes down to it seems is that i must make the master
>>>> commit frequency the same as i'd want the slaves to use - and this is
>>>> markedly different than previous behaviour with which i could have the
>>>> master get updated(+committed to) at one rate and slaves committing those
>>>> updates at a different rate.
>>> I see , the argument. But , isn't it better to keep both the mster and
>>> slave as consistent as possible? There is no use in committing in
>>> master, if you do not plan to search on those docs. So the best thing
>>> to do is do a commit only as frequently as you wish to commit in a
>>> slave.
>>>
>>> On a different track, if we can have an option of disabling commit
>>> after replication, is it worth it? So the user can trigger a commit
>>> explicitly
>>>
>>>>
>>>>
>>>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>>>
>>>>> usually the pollInterval is kept to a small value like 10secs. there
>>>>> is no harm in polling more frequently. This can ensure that the
>>>>> replication happens at almost same time
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabati<ji...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Hey Shalin,
>>>>>> thanks for your prompt reply.
>>>>>> To clarity:
>>>>>> With the old script-based replication, I would snappull every x minutes
>>>>>> (say, on the order of 5 minutes).
>>>>>> Assuming no index optimize occured ( I optimize 1-2 times a day so we can
>>>>>> disregard it for the sake of argument), the snappull would take a few
>>>>>> seconds to run on each iteration.
>>>>>> I then have a crontab on all slaves that runs snapinstall on a fixed
>>>>>> time,
>>>>>> lets say every 15 minutes from start of a round hour, inclusive. (slave
>>>>>> machine times are synced e.g via ntp) so that essentially all slaves will
>>>>>> begin a snapinstall exactly at the same time - assuming uniform load and
>>>>>> the
>>>>>> fact they all have at this point in time the same snapshot since I
>>>>>> snappull
>>>>>> frequently - this leads to a fairly synchronized replication across the
>>>>>> board.
>>>>>>
>>>>>> With the new replication however, it seems that by binding the pulling
>>>>>> and
>>>>>> installing as well specifying the timing in delta's only (as opposed to
>>>>>> "absolute-time" based like in crontab) we've essentially made it
>>>>>> impossible
>>>>>> to effectively keep multiple slaves up to date and synchronized; e.g if
>>>>>> we
>>>>>> set poll interval to 15 minutes, a slight offset in the startup times of
>>>>>> the
>>>>>> slaves (that can very much be the case for arbitrary resets/maintenance
>>>>>> operations) can lead to deviations in snappull(+install) times. this in
>>>>>> turn
>>>>>> is further made worse by the fact that the pollInterval is then computed
>>>>>> based on the offset of when the last commit *finished* - and this number
>>>>>> seems to have a higher variance, e.g due to warmup which might be
>>>>>> different
>>>>>> across machines based on the queries they've handled previously.
>>>>>>
>>>>>> To summarize, It seems to me like it might be beneficial to introduce a
>>>>>> second parameter that acts more like a crontab time-based tableau, in so
>>>>>> far
>>>>>> that it can enable a user to specify when an actual commit should occur -
>>>>>> so
>>>>>> then we can have the pollInterval set to a low value (e.g 60 seconds) but
>>>>>> then specify to only perform a commit on the 0,15,30,45-minutes of every
>>>>>> hour. this makes the commit times on the slaves fairly deterministic.
>>>>>>
>>>>>> Does this make sense or am i missing something with current in-process
>>>>>> replication?
>>>>>>
>>>>>> Thanks,
>>>>>> -Chak
>>>>>>
>>>>>>
>>>>>> Shalin Shekhar Mangar wrote:
>>>>>>>
>>>>>>> On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
>>>>>>> <ji...@gmail.com>wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> In the old replication, I could snappull with multiple slaves
>>>>>>>> asynchronously
>>>>>>>> but perform the snapinstall on each at the same time (+- epsilon
>>>>>>>> seconds),
>>>>>>>> so that way production load balanced query serving will always be
>>>>>>>> consistent.
>>>>>>>>
>>>>>>>> With the new system it seems that i have no control over syncing them,
>>>>>>>> but
>>>>>>>> rather it polls every few minutes and then decides the next cycle based
>>>>>>>> on
>>>>>>>> last time it *finished* updating, so in any case I lose control over
>>>>>>>> the
>>>>>>>> synchronization of snap installation across multiple slaves.
>>>>>>>>
>>>>>>>
>>>>>>> That is true. How did you synchronize them with the script based
>>>>>>> solution?
>>>>>>> Assuming network bandwidth is equally distributed and all slaves are
>>>>>>> equal
>>>>>>> in hardware/configuration, the time difference between new searcher
>>>>>>> registration on any slave should not be more then pollInterval, no?
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Also, I noticed the default poll interval is 60 seconds. It would seem
>>>>>>>> that
>>>>>>>> for such a rapid interval, what i mentioned above is a non issue,
>>>>>>>> however
>>>>>>>> i
>>>>>>>> am not clear how this works vis-a-vis the new searcher warmup? for a
>>>>>>>> considerable index size (20Million docs+) the warmup itself is an
>>>>>>>> expensive
>>>>>>>> and somewhat lengthy process and if a new searcher opens and warms up
>>>>>>>> every
>>>>>>>> minute, I am not at all sure i'll be able to serve queries with
>>>>>>>> reasonable
>>>>>>>> QTimes.
>>>>>>>>
>>>>>>>
>>>>>>> If the pollInterval is 60 seconds, it does not mean that a new index is
>>>>>>> fetched every 60 seconds. A new index is downloaded and installed on the
>>>>>>> slave only if a commit happened on the master (i.e. the index was
>>>>>>> actually
>>>>>>> changed on the master).
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Shalin Shekhar Mangar.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -----------------------------------------------------
>>>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968460.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> -----------------------------------------------------
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>
>>
>

Re: Solr 1.4 Replication scheme

Posted by Jason Rutherglen <ja...@gmail.com>.

This would be good! Especially for NRT where this problem is
somewhat harder. I think we may need to look at caching readers
per corresponding http session. The pitfall is expiring them
before running out of RAM.

On Fri, Aug 14, 2009 at 6:34 AM, Yonik Seeley<yo...@lucidimagination.com> wrote:
> Longer term, it might be nice to enable clients to specify what
> version of the index they were searching against.  This could be used
> to prevent consistency issues across different slaves, even if they
> commit at different times.  It could also be used in distributed
> search to make sure the index didn't change between phases.
>
> -Yonik
> http://www.lucidimagination.com
>
>
>
> 2009/8/14 Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com>:
>> On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabati<ji...@gmail.com> wrote:
>>>
>>> Hey Noble,
>>> you are right in that this will solve the problem, however it implicitly
>>> assumes that commits to the master are infrequent enough ( so that most
>>> polling operations yield no update and only every few polls lead to an
>>> actual commit. )
>>> This is a relatively safe assumption in most cases, but one that couples the
>>> master update policy with the performance of the slaves - if the master gets
>>> updated (and committed to) frequently, slaves might face a commit on every
>>> 1-2 poll's, much more than is feasible given new searcher warmup times..
>>> In effect what this comes down to it seems is that i must make the master
>>> commit frequency the same as i'd want the slaves to use - and this is
>>> markedly different than previous behaviour with which i could have the
>>> master get updated(+committed to) at one rate and slaves committing those
>>> updates at a different rate.
>> I see , the argument. But , isn't it better to keep both the mster and
>> slave as consistent as possible? There is no use in committing in
>> master, if you do not plan to search on those docs. So the best thing
>> to do is do a commit only as frequently as you wish to commit in a
>> slave.
>>
>> On a different track, if we can have an option of disabling commit
>> after replication, is it worth it? So the user can trigger a commit
>> explicitly
>>
>>>
>>>
>>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>>
>>>> usually the pollInterval is kept to a small value like 10secs. there
>>>> is no harm in polling more frequently. This can ensure that the
>>>> replication happens at almost same time
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabati<ji...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Hey Shalin,
>>>>> thanks for your prompt reply.
>>>>> To clarity:
>>>>> With the old script-based replication, I would snappull every x minutes
>>>>> (say, on the order of 5 minutes).
>>>>> Assuming no index optimize occured ( I optimize 1-2 times a day so we can
>>>>> disregard it for the sake of argument), the snappull would take a few
>>>>> seconds to run on each iteration.
>>>>> I then have a crontab on all slaves that runs snapinstall on a fixed
>>>>> time,
>>>>> lets say every 15 minutes from start of a round hour, inclusive. (slave
>>>>> machine times are synced e.g via ntp) so that essentially all slaves will
>>>>> begin a snapinstall exactly at the same time - assuming uniform load and
>>>>> the
>>>>> fact they all have at this point in time the same snapshot since I
>>>>> snappull
>>>>> frequently - this leads to a fairly synchronized replication across the
>>>>> board.
>>>>>
>>>>> With the new replication however, it seems that by binding the pulling
>>>>> and
>>>>> installing as well specifying the timing in delta's only (as opposed to
>>>>> "absolute-time" based like in crontab) we've essentially made it
>>>>> impossible
>>>>> to effectively keep multiple slaves up to date and synchronized; e.g if
>>>>> we
>>>>> set poll interval to 15 minutes, a slight offset in the startup times of
>>>>> the
>>>>> slaves (that can very much be the case for arbitrary resets/maintenance
>>>>> operations) can lead to deviations in snappull(+install) times. this in
>>>>> turn
>>>>> is further made worse by the fact that the pollInterval is then computed
>>>>> based on the offset of when the last commit *finished* - and this number
>>>>> seems to have a higher variance, e.g due to warmup which might be
>>>>> different
>>>>> across machines based on the queries they've handled previously.
>>>>>
>>>>> To summarize, It seems to me like it might be beneficial to introduce a
>>>>> second parameter that acts more like a crontab time-based tableau, in so
>>>>> far
>>>>> that it can enable a user to specify when an actual commit should occur -
>>>>> so
>>>>> then we can have the pollInterval set to a low value (e.g 60 seconds) but
>>>>> then specify to only perform a commit on the 0,15,30,45-minutes of every
>>>>> hour. this makes the commit times on the slaves fairly deterministic.
>>>>>
>>>>> Does this make sense or am i missing something with current in-process
>>>>> replication?
>>>>>
>>>>> Thanks,
>>>>> -Chak
>>>>>
>>>>>
>>>>> Shalin Shekhar Mangar wrote:
>>>>>>
>>>>>> On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
>>>>>> <ji...@gmail.com>wrote:
>>>>>>
>>>>>>>
>>>>>>> In the old replication, I could snappull with multiple slaves
>>>>>>> asynchronously
>>>>>>> but perform the snapinstall on each at the same time (+- epsilon
>>>>>>> seconds),
>>>>>>> so that way production load balanced query serving will always be
>>>>>>> consistent.
>>>>>>>
>>>>>>> With the new system it seems that i have no control over syncing them,
>>>>>>> but
>>>>>>> rather it polls every few minutes and then decides the next cycle based
>>>>>>> on
>>>>>>> last time it *finished* updating, so in any case I lose control over
>>>>>>> the
>>>>>>> synchronization of snap installation across multiple slaves.
>>>>>>>
>>>>>>
>>>>>> That is true. How did you synchronize them with the script based
>>>>>> solution?
>>>>>> Assuming network bandwidth is equally distributed and all slaves are
>>>>>> equal
>>>>>> in hardware/configuration, the time difference between new searcher
>>>>>> registration on any slave should not be more then pollInterval, no?
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Also, I noticed the default poll interval is 60 seconds. It would seem
>>>>>>> that
>>>>>>> for such a rapid interval, what i mentioned above is a non issue,
>>>>>>> however
>>>>>>> i
>>>>>>> am not clear how this works vis-a-vis the new searcher warmup? for a
>>>>>>> considerable index size (20Million docs+) the warmup itself is an
>>>>>>> expensive
>>>>>>> and somewhat lengthy process and if a new searcher opens and warms up
>>>>>>> every
>>>>>>> minute, I am not at all sure i'll be able to serve queries with
>>>>>>> reasonable
>>>>>>> QTimes.
>>>>>>>
>>>>>>
>>>>>> If the pollInterval is 60 seconds, it does not mean that a new index is
>>>>>> fetched every 60 seconds. A new index is downloaded and installed on the
>>>>>> slave only if a commit happened on the master (i.e. the index was
>>>>>> actually
>>>>>> changed on the master).
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Shalin Shekhar Mangar.
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -----------------------------------------------------
>>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>>
>>>>
>>>
>>> --
>>> View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968460.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>

Re: Solr 1.4 Replication scheme

Posted by Yonik Seeley <yo...@lucidimagination.com>.

Longer term, it might be nice to enable clients to specify what
version of the index they were searching against.  This could be used
to prevent consistency issues across different slaves, even if they
commit at different times.  It could also be used in distributed
search to make sure the index didn't change between phases.

-Yonik
http://www.lucidimagination.com



2009/8/14 Noble Paul നോബിള്‍  नोब्ळ् <no...@corp.aol.com>:
> On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabati<ji...@gmail.com> wrote:
>>
>> Hey Noble,
>> you are right in that this will solve the problem, however it implicitly
>> assumes that commits to the master are infrequent enough ( so that most
>> polling operations yield no update and only every few polls lead to an
>> actual commit. )
>> This is a relatively safe assumption in most cases, but one that couples the
>> master update policy with the performance of the slaves - if the master gets
>> updated (and committed to) frequently, slaves might face a commit on every
>> 1-2 poll's, much more than is feasible given new searcher warmup times..
>> In effect what this comes down to it seems is that i must make the master
>> commit frequency the same as i'd want the slaves to use - and this is
>> markedly different than previous behaviour with which i could have the
>> master get updated(+committed to) at one rate and slaves committing those
>> updates at a different rate.
> I see , the argument. But , isn't it better to keep both the mster and
> slave as consistent as possible? There is no use in committing in
> master, if you do not plan to search on those docs. So the best thing
> to do is do a commit only as frequently as you wish to commit in a
> slave.
>
> On a different track, if we can have an option of disabling commit
> after replication, is it worth it? So the user can trigger a commit
> explicitly
>
>>
>>
>> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>>
>>> usually the pollInterval is kept to a small value like 10secs. there
>>> is no harm in polling more frequently. This can ensure that the
>>> replication happens at almost same time
>>>
>>>
>>>
>>>
>>> On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabati<ji...@gmail.com>
>>> wrote:
>>>>
>>>> Hey Shalin,
>>>> thanks for your prompt reply.
>>>> To clarity:
>>>> With the old script-based replication, I would snappull every x minutes
>>>> (say, on the order of 5 minutes).
>>>> Assuming no index optimize occured ( I optimize 1-2 times a day so we can
>>>> disregard it for the sake of argument), the snappull would take a few
>>>> seconds to run on each iteration.
>>>> I then have a crontab on all slaves that runs snapinstall on a fixed
>>>> time,
>>>> lets say every 15 minutes from start of a round hour, inclusive. (slave
>>>> machine times are synced e.g via ntp) so that essentially all slaves will
>>>> begin a snapinstall exactly at the same time - assuming uniform load and
>>>> the
>>>> fact they all have at this point in time the same snapshot since I
>>>> snappull
>>>> frequently - this leads to a fairly synchronized replication across the
>>>> board.
>>>>
>>>> With the new replication however, it seems that by binding the pulling
>>>> and
>>>> installing as well specifying the timing in delta's only (as opposed to
>>>> "absolute-time" based like in crontab) we've essentially made it
>>>> impossible
>>>> to effectively keep multiple slaves up to date and synchronized; e.g if
>>>> we
>>>> set poll interval to 15 minutes, a slight offset in the startup times of
>>>> the
>>>> slaves (that can very much be the case for arbitrary resets/maintenance
>>>> operations) can lead to deviations in snappull(+install) times. this in
>>>> turn
>>>> is further made worse by the fact that the pollInterval is then computed
>>>> based on the offset of when the last commit *finished* - and this number
>>>> seems to have a higher variance, e.g due to warmup which might be
>>>> different
>>>> across machines based on the queries they've handled previously.
>>>>
>>>> To summarize, It seems to me like it might be beneficial to introduce a
>>>> second parameter that acts more like a crontab time-based tableau, in so
>>>> far
>>>> that it can enable a user to specify when an actual commit should occur -
>>>> so
>>>> then we can have the pollInterval set to a low value (e.g 60 seconds) but
>>>> then specify to only perform a commit on the 0,15,30,45-minutes of every
>>>> hour. this makes the commit times on the slaves fairly deterministic.
>>>>
>>>> Does this make sense or am i missing something with current in-process
>>>> replication?
>>>>
>>>> Thanks,
>>>> -Chak
>>>>
>>>>
>>>> Shalin Shekhar Mangar wrote:
>>>>>
>>>>> On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
>>>>> <ji...@gmail.com>wrote:
>>>>>
>>>>>>
>>>>>> In the old replication, I could snappull with multiple slaves
>>>>>> asynchronously
>>>>>> but perform the snapinstall on each at the same time (+- epsilon
>>>>>> seconds),
>>>>>> so that way production load balanced query serving will always be
>>>>>> consistent.
>>>>>>
>>>>>> With the new system it seems that i have no control over syncing them,
>>>>>> but
>>>>>> rather it polls every few minutes and then decides the next cycle based
>>>>>> on
>>>>>> last time it *finished* updating, so in any case I lose control over
>>>>>> the
>>>>>> synchronization of snap installation across multiple slaves.
>>>>>>
>>>>>
>>>>> That is true. How did you synchronize them with the script based
>>>>> solution?
>>>>> Assuming network bandwidth is equally distributed and all slaves are
>>>>> equal
>>>>> in hardware/configuration, the time difference between new searcher
>>>>> registration on any slave should not be more then pollInterval, no?
>>>>>
>>>>>
>>>>>>
>>>>>> Also, I noticed the default poll interval is 60 seconds. It would seem
>>>>>> that
>>>>>> for such a rapid interval, what i mentioned above is a non issue,
>>>>>> however
>>>>>> i
>>>>>> am not clear how this works vis-a-vis the new searcher warmup? for a
>>>>>> considerable index size (20Million docs+) the warmup itself is an
>>>>>> expensive
>>>>>> and somewhat lengthy process and if a new searcher opens and warms up
>>>>>> every
>>>>>> minute, I am not at all sure i'll be able to serve queries with
>>>>>> reasonable
>>>>>> QTimes.
>>>>>>
>>>>>
>>>>> If the pollInterval is 60 seconds, it does not mean that a new index is
>>>>> fetched every 60 seconds. A new index is downloaded and installed on the
>>>>> slave only if a commit happened on the master (i.e. the index was
>>>>> actually
>>>>> changed on the master).
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Shalin Shekhar Mangar.
>>>>>
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> -----------------------------------------------------
>>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>>
>>>
>>
>> --
>> View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968460.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>
>
>
> --
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
>

Re: Solr 1.4 Replication scheme

Posted by Chris Hostetter <ho...@fucit.org>.

: > This is a relatively safe assumption in most cases, but one that couples the
: > master update policy with the performance of the slaves - if the master gets
: > updated (and committed to) frequently, slaves might face a commit on every
: > 1-2 poll's, much more than is feasible given new searcher warmup times..
: > In effect what this comes down to it seems is that i must make the master
: > commit frequency the same as i'd want the slaves to use - and this is
: > markedly different than previous behaviour with which i could have the
: > master get updated(+committed to) at one rate and slaves committing those
: > updates at a different rate.

: I see , the argument. But , isn't it better to keep both the mster and
: slave as consistent as possible? There is no use in committing in
: master, if you do not plan to search on those docs. So the best thing
: to do is do a commit only as frequently as you wish to commit in a
: slave.

i would advise against thinking that when designing anything rlated to 
replication -- people should call commit based on when they want the 
documents they've added to be available for consumers -- for a single box 
setup, your consumers are people executing searchers, but for a multi-tier 
setup your consumers are the slaves replicating from you (the master) -- 
but your consumers may not all have equal concerns about frehness.  some 
of the slaves may want to poll to get new updates from you as fast as 
possible and have the freshest data atthe expense of lower cache hit rates 
and increases network IO, others may be happier with stale data in return 
for better cache hit rates or lower network IO (even in a realtime search 
situation, you may also be replicating to a slave in a remote data center 
with a small network pipe who only wants one snappull a day for backend 
analytics and an extremely consistent view of hte index for a long 
durration of analysis.


the point being: we shouldn't assume/expect that slaves will always want 
updates as fast as possible, or that all slaves of a single master will 
want all updates with equal urgency ... individual slaves need to be able 
to choose.



-Hoss

Re: Solr 1.4 Replication scheme

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.

On Fri, Aug 14, 2009 at 2:28 PM, KaktuChakarabati<ji...@gmail.com> wrote:
>
> Hey Noble,
> you are right in that this will solve the problem, however it implicitly
> assumes that commits to the master are infrequent enough ( so that most
> polling operations yield no update and only every few polls lead to an
> actual commit. )
> This is a relatively safe assumption in most cases, but one that couples the
> master update policy with the performance of the slaves - if the master gets
> updated (and committed to) frequently, slaves might face a commit on every
> 1-2 poll's, much more than is feasible given new searcher warmup times..
> In effect what this comes down to it seems is that i must make the master
> commit frequency the same as i'd want the slaves to use - and this is
> markedly different than previous behaviour with which i could have the
> master get updated(+committed to) at one rate and slaves committing those
> updates at a different rate.
I see , the argument. But , isn't it better to keep both the mster and
slave as consistent as possible? There is no use in committing in
master, if you do not plan to search on those docs. So the best thing
to do is do a commit only as frequently as you wish to commit in a
slave.

On a different track, if we can have an option of disabling commit
after replication, is it worth it? So the user can trigger a commit
explicitly

>
>
> Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
>>
>> usually the pollInterval is kept to a small value like 10secs. there
>> is no harm in polling more frequently. This can ensure that the
>> replication happens at almost same time
>>
>>
>>
>>
>> On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabati<ji...@gmail.com>
>> wrote:
>>>
>>> Hey Shalin,
>>> thanks for your prompt reply.
>>> To clarity:
>>> With the old script-based replication, I would snappull every x minutes
>>> (say, on the order of 5 minutes).
>>> Assuming no index optimize occured ( I optimize 1-2 times a day so we can
>>> disregard it for the sake of argument), the snappull would take a few
>>> seconds to run on each iteration.
>>> I then have a crontab on all slaves that runs snapinstall on a fixed
>>> time,
>>> lets say every 15 minutes from start of a round hour, inclusive. (slave
>>> machine times are synced e.g via ntp) so that essentially all slaves will
>>> begin a snapinstall exactly at the same time - assuming uniform load and
>>> the
>>> fact they all have at this point in time the same snapshot since I
>>> snappull
>>> frequently - this leads to a fairly synchronized replication across the
>>> board.
>>>
>>> With the new replication however, it seems that by binding the pulling
>>> and
>>> installing as well specifying the timing in delta's only (as opposed to
>>> "absolute-time" based like in crontab) we've essentially made it
>>> impossible
>>> to effectively keep multiple slaves up to date and synchronized; e.g if
>>> we
>>> set poll interval to 15 minutes, a slight offset in the startup times of
>>> the
>>> slaves (that can very much be the case for arbitrary resets/maintenance
>>> operations) can lead to deviations in snappull(+install) times. this in
>>> turn
>>> is further made worse by the fact that the pollInterval is then computed
>>> based on the offset of when the last commit *finished* - and this number
>>> seems to have a higher variance, e.g due to warmup which might be
>>> different
>>> across machines based on the queries they've handled previously.
>>>
>>> To summarize, It seems to me like it might be beneficial to introduce a
>>> second parameter that acts more like a crontab time-based tableau, in so
>>> far
>>> that it can enable a user to specify when an actual commit should occur -
>>> so
>>> then we can have the pollInterval set to a low value (e.g 60 seconds) but
>>> then specify to only perform a commit on the 0,15,30,45-minutes of every
>>> hour. this makes the commit times on the slaves fairly deterministic.
>>>
>>> Does this make sense or am i missing something with current in-process
>>> replication?
>>>
>>> Thanks,
>>> -Chak
>>>
>>>
>>> Shalin Shekhar Mangar wrote:
>>>>
>>>> On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
>>>> <ji...@gmail.com>wrote:
>>>>
>>>>>
>>>>> In the old replication, I could snappull with multiple slaves
>>>>> asynchronously
>>>>> but perform the snapinstall on each at the same time (+- epsilon
>>>>> seconds),
>>>>> so that way production load balanced query serving will always be
>>>>> consistent.
>>>>>
>>>>> With the new system it seems that i have no control over syncing them,
>>>>> but
>>>>> rather it polls every few minutes and then decides the next cycle based
>>>>> on
>>>>> last time it *finished* updating, so in any case I lose control over
>>>>> the
>>>>> synchronization of snap installation across multiple slaves.
>>>>>
>>>>
>>>> That is true. How did you synchronize them with the script based
>>>> solution?
>>>> Assuming network bandwidth is equally distributed and all slaves are
>>>> equal
>>>> in hardware/configuration, the time difference between new searcher
>>>> registration on any slave should not be more then pollInterval, no?
>>>>
>>>>
>>>>>
>>>>> Also, I noticed the default poll interval is 60 seconds. It would seem
>>>>> that
>>>>> for such a rapid interval, what i mentioned above is a non issue,
>>>>> however
>>>>> i
>>>>> am not clear how this works vis-a-vis the new searcher warmup? for a
>>>>> considerable index size (20Million docs+) the warmup itself is an
>>>>> expensive
>>>>> and somewhat lengthy process and if a new searcher opens and warms up
>>>>> every
>>>>> minute, I am not at all sure i'll be able to serve queries with
>>>>> reasonable
>>>>> QTimes.
>>>>>
>>>>
>>>> If the pollInterval is 60 seconds, it does not mean that a new index is
>>>> fetched every 60 seconds. A new index is downloaded and installed on the
>>>> slave only if a commit happened on the master (i.e. the index was
>>>> actually
>>>> changed on the master).
>>>>
>>>> --
>>>> Regards,
>>>> Shalin Shekhar Mangar.
>>>>
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul | Principal Engineer| AOL | http://aol.com
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968460.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr 1.4 Replication scheme

Posted by KaktuChakarabati <ji...@gmail.com>.

Hey Noble,
you are right in that this will solve the problem, however it implicitly
assumes that commits to the master are infrequent enough ( so that most
polling operations yield no update and only every few polls lead to an
actual commit. )
This is a relatively safe assumption in most cases, but one that couples the
master update policy with the performance of the slaves - if the master gets
updated (and committed to) frequently, slaves might face a commit on every
1-2 poll's, much more than is feasible given new searcher warmup times..
In effect what this comes down to it seems is that i must make the master
commit frequency the same as i'd want the slaves to use - and this is
markedly different than previous behaviour with which i could have the
master get updated(+committed to) at one rate and slaves committing those
updates at a different rate.


Noble Paul നോബിള്‍  नोब्ळ्-2 wrote:
> 
> usually the pollInterval is kept to a small value like 10secs. there
> is no harm in polling more frequently. This can ensure that the
> replication happens at almost same time
> 
> 
> 
> 
> On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabati<ji...@gmail.com>
> wrote:
>>
>> Hey Shalin,
>> thanks for your prompt reply.
>> To clarity:
>> With the old script-based replication, I would snappull every x minutes
>> (say, on the order of 5 minutes).
>> Assuming no index optimize occured ( I optimize 1-2 times a day so we can
>> disregard it for the sake of argument), the snappull would take a few
>> seconds to run on each iteration.
>> I then have a crontab on all slaves that runs snapinstall on a fixed
>> time,
>> lets say every 15 minutes from start of a round hour, inclusive. (slave
>> machine times are synced e.g via ntp) so that essentially all slaves will
>> begin a snapinstall exactly at the same time - assuming uniform load and
>> the
>> fact they all have at this point in time the same snapshot since I
>> snappull
>> frequently - this leads to a fairly synchronized replication across the
>> board.
>>
>> With the new replication however, it seems that by binding the pulling
>> and
>> installing as well specifying the timing in delta's only (as opposed to
>> "absolute-time" based like in crontab) we've essentially made it
>> impossible
>> to effectively keep multiple slaves up to date and synchronized; e.g if
>> we
>> set poll interval to 15 minutes, a slight offset in the startup times of
>> the
>> slaves (that can very much be the case for arbitrary resets/maintenance
>> operations) can lead to deviations in snappull(+install) times. this in
>> turn
>> is further made worse by the fact that the pollInterval is then computed
>> based on the offset of when the last commit *finished* - and this number
>> seems to have a higher variance, e.g due to warmup which might be
>> different
>> across machines based on the queries they've handled previously.
>>
>> To summarize, It seems to me like it might be beneficial to introduce a
>> second parameter that acts more like a crontab time-based tableau, in so
>> far
>> that it can enable a user to specify when an actual commit should occur -
>> so
>> then we can have the pollInterval set to a low value (e.g 60 seconds) but
>> then specify to only perform a commit on the 0,15,30,45-minutes of every
>> hour. this makes the commit times on the slaves fairly deterministic.
>>
>> Does this make sense or am i missing something with current in-process
>> replication?
>>
>> Thanks,
>> -Chak
>>
>>
>> Shalin Shekhar Mangar wrote:
>>>
>>> On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
>>> <ji...@gmail.com>wrote:
>>>
>>>>
>>>> In the old replication, I could snappull with multiple slaves
>>>> asynchronously
>>>> but perform the snapinstall on each at the same time (+- epsilon
>>>> seconds),
>>>> so that way production load balanced query serving will always be
>>>> consistent.
>>>>
>>>> With the new system it seems that i have no control over syncing them,
>>>> but
>>>> rather it polls every few minutes and then decides the next cycle based
>>>> on
>>>> last time it *finished* updating, so in any case I lose control over
>>>> the
>>>> synchronization of snap installation across multiple slaves.
>>>>
>>>
>>> That is true. How did you synchronize them with the script based
>>> solution?
>>> Assuming network bandwidth is equally distributed and all slaves are
>>> equal
>>> in hardware/configuration, the time difference between new searcher
>>> registration on any slave should not be more then pollInterval, no?
>>>
>>>
>>>>
>>>> Also, I noticed the default poll interval is 60 seconds. It would seem
>>>> that
>>>> for such a rapid interval, what i mentioned above is a non issue,
>>>> however
>>>> i
>>>> am not clear how this works vis-a-vis the new searcher warmup? for a
>>>> considerable index size (20Million docs+) the warmup itself is an
>>>> expensive
>>>> and somewhat lengthy process and if a new searcher opens and warms up
>>>> every
>>>> minute, I am not at all sure i'll be able to serve queries with
>>>> reasonable
>>>> QTimes.
>>>>
>>>
>>> If the pollInterval is 60 seconds, it does not mean that a new index is
>>> fetched every 60 seconds. A new index is downloaded and installed on the
>>> slave only if a commit happened on the master (i.e. the index was
>>> actually
>>> changed on the master).
>>>
>>> --
>>> Regards,
>>> Shalin Shekhar Mangar.
>>>
>>>
>>
>> --
>> View this message in context:
>> http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> -----------------------------------------------------
> Noble Paul | Principal Engineer| AOL | http://aol.com
> 
> 

-- 
View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968460.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 Replication scheme

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@corp.aol.com>.

usually the pollInterval is kept to a small value like 10secs. there
is no harm in polling more frequently. This can ensure that the
replication happens at almost same time




On Fri, Aug 14, 2009 at 1:58 PM, KaktuChakarabati<ji...@gmail.com> wrote:
>
> Hey Shalin,
> thanks for your prompt reply.
> To clarity:
> With the old script-based replication, I would snappull every x minutes
> (say, on the order of 5 minutes).
> Assuming no index optimize occured ( I optimize 1-2 times a day so we can
> disregard it for the sake of argument), the snappull would take a few
> seconds to run on each iteration.
> I then have a crontab on all slaves that runs snapinstall on a fixed time,
> lets say every 15 minutes from start of a round hour, inclusive. (slave
> machine times are synced e.g via ntp) so that essentially all slaves will
> begin a snapinstall exactly at the same time - assuming uniform load and the
> fact they all have at this point in time the same snapshot since I snappull
> frequently - this leads to a fairly synchronized replication across the
> board.
>
> With the new replication however, it seems that by binding the pulling and
> installing as well specifying the timing in delta's only (as opposed to
> "absolute-time" based like in crontab) we've essentially made it impossible
> to effectively keep multiple slaves up to date and synchronized; e.g if we
> set poll interval to 15 minutes, a slight offset in the startup times of the
> slaves (that can very much be the case for arbitrary resets/maintenance
> operations) can lead to deviations in snappull(+install) times. this in turn
> is further made worse by the fact that the pollInterval is then computed
> based on the offset of when the last commit *finished* - and this number
> seems to have a higher variance, e.g due to warmup which might be different
> across machines based on the queries they've handled previously.
>
> To summarize, It seems to me like it might be beneficial to introduce a
> second parameter that acts more like a crontab time-based tableau, in so far
> that it can enable a user to specify when an actual commit should occur - so
> then we can have the pollInterval set to a low value (e.g 60 seconds) but
> then specify to only perform a commit on the 0,15,30,45-minutes of every
> hour. this makes the commit times on the slaves fairly deterministic.
>
> Does this make sense or am i missing something with current in-process
> replication?
>
> Thanks,
> -Chak
>
>
> Shalin Shekhar Mangar wrote:
>>
>> On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
>> <ji...@gmail.com>wrote:
>>
>>>
>>> In the old replication, I could snappull with multiple slaves
>>> asynchronously
>>> but perform the snapinstall on each at the same time (+- epsilon
>>> seconds),
>>> so that way production load balanced query serving will always be
>>> consistent.
>>>
>>> With the new system it seems that i have no control over syncing them,
>>> but
>>> rather it polls every few minutes and then decides the next cycle based
>>> on
>>> last time it *finished* updating, so in any case I lose control over the
>>> synchronization of snap installation across multiple slaves.
>>>
>>
>> That is true. How did you synchronize them with the script based solution?
>> Assuming network bandwidth is equally distributed and all slaves are equal
>> in hardware/configuration, the time difference between new searcher
>> registration on any slave should not be more then pollInterval, no?
>>
>>
>>>
>>> Also, I noticed the default poll interval is 60 seconds. It would seem
>>> that
>>> for such a rapid interval, what i mentioned above is a non issue, however
>>> i
>>> am not clear how this works vis-a-vis the new searcher warmup? for a
>>> considerable index size (20Million docs+) the warmup itself is an
>>> expensive
>>> and somewhat lengthy process and if a new searcher opens and warms up
>>> every
>>> minute, I am not at all sure i'll be able to serve queries with
>>> reasonable
>>> QTimes.
>>>
>>
>> If the pollInterval is 60 seconds, it does not mean that a new index is
>> fetched every 60 seconds. A new index is downloaded and installed on the
>> slave only if a commit happened on the master (i.e. the index was actually
>> changed on the master).
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr 1.4 Replication scheme

Posted by KaktuChakarabati <ji...@gmail.com>.

Hey Shalin,
thanks for your prompt reply.
To clarity:
With the old script-based replication, I would snappull every x minutes
(say, on the order of 5 minutes).
Assuming no index optimize occured ( I optimize 1-2 times a day so we can
disregard it for the sake of argument), the snappull would take a few
seconds to run on each iteration. 
I then have a crontab on all slaves that runs snapinstall on a fixed time,
lets say every 15 minutes from start of a round hour, inclusive. (slave
machine times are synced e.g via ntp) so that essentially all slaves will
begin a snapinstall exactly at the same time - assuming uniform load and the
fact they all have at this point in time the same snapshot since I snappull
frequently - this leads to a fairly synchronized replication across the
board.

With the new replication however, it seems that by binding the pulling and
installing as well specifying the timing in delta's only (as opposed to
"absolute-time" based like in crontab) we've essentially made it impossible
to effectively keep multiple slaves up to date and synchronized; e.g if we
set poll interval to 15 minutes, a slight offset in the startup times of the
slaves (that can very much be the case for arbitrary resets/maintenance
operations) can lead to deviations in snappull(+install) times. this in turn
is further made worse by the fact that the pollInterval is then computed
based on the offset of when the last commit *finished* - and this number
seems to have a higher variance, e.g due to warmup which might be different
across machines based on the queries they've handled previously.

To summarize, It seems to me like it might be beneficial to introduce a
second parameter that acts more like a crontab time-based tableau, in so far
that it can enable a user to specify when an actual commit should occur - so
then we can have the pollInterval set to a low value (e.g 60 seconds) but
then specify to only perform a commit on the 0,15,30,45-minutes of every
hour. this makes the commit times on the slaves fairly deterministic.

Does this make sense or am i missing something with current in-process
replication?

Thanks,
-Chak

Shalin Shekhar Mangar wrote:
> 
> On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati
> <ji...@gmail.com>wrote:
> 
>>
>> In the old replication, I could snappull with multiple slaves
>> asynchronously
>> but perform the snapinstall on each at the same time (+- epsilon
>> seconds),
>> so that way production load balanced query serving will always be
>> consistent.
>>
>> With the new system it seems that i have no control over syncing them,
>> but
>> rather it polls every few minutes and then decides the next cycle based
>> on
>> last time it *finished* updating, so in any case I lose control over the
>> synchronization of snap installation across multiple slaves.
>>
> 
> That is true. How did you synchronize them with the script based solution?
> Assuming network bandwidth is equally distributed and all slaves are equal
> in hardware/configuration, the time difference between new searcher
> registration on any slave should not be more then pollInterval, no?
> 
> 
>>
>> Also, I noticed the default poll interval is 60 seconds. It would seem
>> that
>> for such a rapid interval, what i mentioned above is a non issue, however
>> i
>> am not clear how this works vis-a-vis the new searcher warmup? for a
>> considerable index size (20Million docs+) the warmup itself is an
>> expensive
>> and somewhat lengthy process and if a new searcher opens and warms up
>> every
>> minute, I am not at all sure i'll be able to serve queries with
>> reasonable
>> QTimes.
>>
> 
> If the pollInterval is 60 seconds, it does not mean that a new index is
> fetched every 60 seconds. A new index is downloaded and installed on the
> slave only if a commit happened on the master (i.e. the index was actually
> changed on the master).
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr 1.4 Replication scheme

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.

On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati <ji...@gmail.com>wrote:

>
> In the old replication, I could snappull with multiple slaves
> asynchronously
> but perform the snapinstall on each at the same time (+- epsilon seconds),
> so that way production load balanced query serving will always be
> consistent.
>
> With the new system it seems that i have no control over syncing them, but
> rather it polls every few minutes and then decides the next cycle based on
> last time it *finished* updating, so in any case I lose control over the
> synchronization of snap installation across multiple slaves.
>

That is true. How did you synchronize them with the script based solution?
Assuming network bandwidth is equally distributed and all slaves are equal
in hardware/configuration, the time difference between new searcher
registration on any slave should not be more then pollInterval, no?

>
> Also, I noticed the default poll interval is 60 seconds. It would seem that
> for such a rapid interval, what i mentioned above is a non issue, however i
> am not clear how this works vis-a-vis the new searcher warmup? for a
> considerable index size (20Million docs+) the warmup itself is an expensive
> and somewhat lengthy process and if a new searcher opens and warms up every
> minute, I am not at all sure i'll be able to serve queries with reasonable
> QTimes.
>

If the pollInterval is 60 seconds, it does not mean that a new index is
fetched every 60 seconds. A new index is downloaded and installed on the
slave only if a commit happened on the master (i.e. the index was actually
changed on the master).

-- 
Regards,
Shalin Shekhar Mangar.