You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Kranti Parisa <kr...@gmail.com> on 2013/08/05 06:14:23 UTC

Threshold Checks for Replication in solrconfig.xml

Hi,

I think, it would be nice to configure Solr for the threshold checks before
doing the index replication. This would stop a bad index to be copied over
to the slaves which are ideally the ones serving the user requests.

In our case, we will have Solr Indexer which will index the documents.
Before starting the indexing process we disable the replication and then
index the documents. Then perform the threshold checks and if we have a
reasonable index then we enable the replication. So that the Solr Query
Engines will have a good index to server the user queries.

I have been thinking how it would be if we have this facility in Solr
(solrconfig.xml) by default for everyone.

We may have something like this inside the Replication Request Handler
section (either master can check before enabling replciation or slave can
check against the master before downloading the index, which ever is best,
I think better master does this check so that all the slaves need not check
for same thing against the master)

 <lst name="thresholdchecks">
         <str query="id:[* TO *]">100000</str>
          <str query="id:[* TO *] AND type:movie">40000</str>
         <str query="id:[* TO *] AND type:music">10000</str>
  </lst>

I think, this is a very common task for people using Solr replication. I am
interested to work on this feature and commit the same. Before that I would
like to know your views on this feature. If this is something already
exists or coming up, please let me know!


Thanks & Regards,
Kranti K Parisa
http://www.linkedin.com/in/krantiparisa

Re: Threshold Checks for Replication in solrconfig.xml

Posted by mevan <s....@gmail.com>.
Hi,

I agree with Kranti on this...

There is a multitude of reasons that would cause an index to be "bad". One
such common case is when the index is built without any functional problems
(no errors, exceptions), but the document counts are low. This occurs when
there is a glitch in the document pipelines feeding the index. In this case,
the index is "correctly" built, but has only 50% or less of the expected
documents.

For a 24x7 system such as mine, which services > 500 million queries/day,
I'd rather have an older index than replicating a fresh index (say, on a
Saturday night) that has 50% or less of the expected documents. This:
a) provides customers with usable experience (though somewhat stale data)
b) gives my Ops team an opportunity to fix the issue during biz hours, or
next convenient time. We can trigger alarms to Ops when this occurs. Bottom
line, customers don't get severely impacted, and we can fix it at the next
available time.

A fresh index is built every X hours. Therefore, this problem is guaranteed
to happen in a 24x7x365 system with multiple/dynamic data sources. It's a
safeguard needed in SOLR.







--
View this message in context: http://lucene.472066.n3.nabble.com/Threshold-Checks-for-Replication-in-solrconfig-xml-tp4082458p4164856.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: Threshold Checks for Replication in solrconfig.xml

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
I'm feeling , you are concerned about bad index. Don't you think we should
be trying to avoid replicating bad index instead of thresholds?


On Mon, Aug 5, 2013 at 11:55 AM, Kranti Parisa <kr...@gmail.com>wrote:

> Yes, we can disable replication and perform the checks manually, that is
> what we are doing currently. And yes, the idea of configuring threshold
> checks is to delay the replication in case of bad index (if threshold
> checks are not needed, we can avoid configuring the same). It would give us
> control over a bad index especially in the cases of frequent
> deletes/updates for the expired assets.
>
>
>
> Thanks & Regards,
> Kranti K Parisa
> http://www.linkedin.com/in/krantiparisa
>
>
>
> On Mon, Aug 5, 2013 at 2:12 AM, Noble Paul നോബിള്‍ नोब्ळ् <
> noble.paul@gmail.com> wrote:
>
>> What is the objective here?
>>
>> Now you can disable replication with a command on the master and re
>> enable it later. Do you wish to make it a bit easier with this?
>>
>>
>> the threshold check according to your example will delay the replication
>> forever if the threshold is not reached at all
>>
>> This is only useful if you are doing a fresh reindex
>>
>>
>>
>> On Mon, Aug 5, 2013 at 9:44 AM, Kranti Parisa <kr...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I think, it would be nice to configure Solr for the threshold checks
>>> before doing the index replication. This would stop a bad index to be
>>> copied over to the slaves which are ideally the ones serving the user
>>> requests.
>>>
>>> In our case, we will have Solr Indexer which will index the documents.
>>> Before starting the indexing process we disable the replication and then
>>> index the documents. Then perform the threshold checks and if we have a
>>> reasonable index then we enable the replication. So that the Solr Query
>>> Engines will have a good index to server the user queries.
>>>
>>> I have been thinking how it would be if we have this facility in Solr
>>> (solrconfig.xml) by default for everyone.
>>>
>>> We may have something like this inside the Replication Request Handler
>>> section (either master can check before enabling replciation or slave can
>>> check against the master before downloading the index, which ever is best,
>>> I think better master does this check so that all the slaves need not check
>>> for same thing against the master)
>>>
>>>  <lst name="thresholdchecks">
>>>          <str query="id:[* TO *]">100000</str>
>>>           <str query="id:[* TO *] AND type:movie">40000</str>
>>>          <str query="id:[* TO *] AND type:music">10000</str>
>>>   </lst>
>>>
>>> I think, this is a very common task for people using Solr replication. I
>>> am interested to work on this feature and commit the same. Before that I
>>> would like to know your views on this feature. If this is something already
>>> exists or coming up, please let me know!
>>>
>>>
>>> Thanks & Regards,
>>> Kranti K Parisa
>>> http://www.linkedin.com/in/krantiparisa
>>>
>>>
>>
>>
>> --
>> -----------------------------------------------------
>> Noble Paul
>>
>
>


-- 
-----------------------------------------------------
Noble Paul

Re: Threshold Checks for Replication in solrconfig.xml

Posted by Kranti Parisa <kr...@gmail.com>.
Yes, we can disable replication and perform the checks manually, that is
what we are doing currently. And yes, the idea of configuring threshold
checks is to delay the replication in case of bad index (if threshold
checks are not needed, we can avoid configuring the same). It would give us
control over a bad index especially in the cases of frequent
deletes/updates for the expired assets.



Thanks & Regards,
Kranti K Parisa
http://www.linkedin.com/in/krantiparisa



On Mon, Aug 5, 2013 at 2:12 AM, Noble Paul നോബിള്‍ नोब्ळ् <
noble.paul@gmail.com> wrote:

> What is the objective here?
>
> Now you can disable replication with a command on the master and re enable
> it later. Do you wish to make it a bit easier with this?
>
>
> the threshold check according to your example will delay the replication
> forever if the threshold is not reached at all
>
> This is only useful if you are doing a fresh reindex
>
>
>
> On Mon, Aug 5, 2013 at 9:44 AM, Kranti Parisa <kr...@gmail.com>wrote:
>
>> Hi,
>>
>> I think, it would be nice to configure Solr for the threshold checks
>> before doing the index replication. This would stop a bad index to be
>> copied over to the slaves which are ideally the ones serving the user
>> requests.
>>
>> In our case, we will have Solr Indexer which will index the documents.
>> Before starting the indexing process we disable the replication and then
>> index the documents. Then perform the threshold checks and if we have a
>> reasonable index then we enable the replication. So that the Solr Query
>> Engines will have a good index to server the user queries.
>>
>> I have been thinking how it would be if we have this facility in Solr
>> (solrconfig.xml) by default for everyone.
>>
>> We may have something like this inside the Replication Request Handler
>> section (either master can check before enabling replciation or slave can
>> check against the master before downloading the index, which ever is best,
>> I think better master does this check so that all the slaves need not check
>> for same thing against the master)
>>
>>  <lst name="thresholdchecks">
>>          <str query="id:[* TO *]">100000</str>
>>           <str query="id:[* TO *] AND type:movie">40000</str>
>>          <str query="id:[* TO *] AND type:music">10000</str>
>>   </lst>
>>
>> I think, this is a very common task for people using Solr replication. I
>> am interested to work on this feature and commit the same. Before that I
>> would like to know your views on this feature. If this is something already
>> exists or coming up, please let me know!
>>
>>
>> Thanks & Regards,
>> Kranti K Parisa
>> http://www.linkedin.com/in/krantiparisa
>>
>>
>
>
> --
> -----------------------------------------------------
> Noble Paul
>

Re: Threshold Checks for Replication in solrconfig.xml

Posted by Noble Paul നോബിള്‍ नोब्ळ् <no...@gmail.com>.
What is the objective here?

Now you can disable replication with a command on the master and re enable
it later. Do you wish to make it a bit easier with this?


the threshold check according to your example will delay the replication
forever if the threshold is not reached at all

This is only useful if you are doing a fresh reindex



On Mon, Aug 5, 2013 at 9:44 AM, Kranti Parisa <kr...@gmail.com>wrote:

> Hi,
>
> I think, it would be nice to configure Solr for the threshold checks
> before doing the index replication. This would stop a bad index to be
> copied over to the slaves which are ideally the ones serving the user
> requests.
>
> In our case, we will have Solr Indexer which will index the documents.
> Before starting the indexing process we disable the replication and then
> index the documents. Then perform the threshold checks and if we have a
> reasonable index then we enable the replication. So that the Solr Query
> Engines will have a good index to server the user queries.
>
> I have been thinking how it would be if we have this facility in Solr
> (solrconfig.xml) by default for everyone.
>
> We may have something like this inside the Replication Request Handler
> section (either master can check before enabling replciation or slave can
> check against the master before downloading the index, which ever is best,
> I think better master does this check so that all the slaves need not check
> for same thing against the master)
>
>  <lst name="thresholdchecks">
>          <str query="id:[* TO *]">100000</str>
>           <str query="id:[* TO *] AND type:movie">40000</str>
>          <str query="id:[* TO *] AND type:music">10000</str>
>   </lst>
>
> I think, this is a very common task for people using Solr replication. I
> am interested to work on this feature and commit the same. Before that I
> would like to know your views on this feature. If this is something already
> exists or coming up, please let me know!
>
>
> Thanks & Regards,
> Kranti K Parisa
> http://www.linkedin.com/in/krantiparisa
>
>


-- 
-----------------------------------------------------
Noble Paul