You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bharath Kumar <bh...@gmail.com> on 2016/06/14 05:41:39 UTC

Regarding CDCR SOLR 6

Hi,

I have setup cross data center replication using solr 6, i want to know why
the buffer needs to be enabled on the source cluster? Even if the buffer is
not enabled, i am able to replicate the data between source and target
sites. What is the advantages of enabling the buffer on the source site? If
i enable the buffer, the transaction logs are never deleted and over a
period of time we are running out of disk. Can you please let me know why
the buffer enabling is required?

-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Re: Regarding CDCR SOLR 6

Posted by Renaud Delbru <re...@siren.solutions>.
Hi,

On 15/06/16 03:18, Bharath Kumar wrote:
> Hi Renaud,
>
> Thank you so much for your response. It is very helpful and it helped me
> understand the need for turning on buffering.
>
> Is it recommended to keep the buffering enabled all the time on the
> source cluster? If the target cluster is up and running and the cdcr is
> started, can i turn off the buffering on the source site?

yes, no need to keep buffering on if your target cluster is up and 
running and cdcr replication is started.

> As you have mentioned, the transaction logs are kept on the source
> cluster, until the data is replicated on the target cluster, once the
> cdcr is started. Is there a possibility that target cluster is out of
> sync with the source cluster and we need to do a hard recovery from the
> source cluster to sync up the target cluster?

If the target cluster goes down while cdcr is replicating, there should 
be no loss of information. The source cluster will try from time to time 
to communicate with the target and continue the replication until the 
target cluster is back up and running. Until it can resume 
communication, the source cluster will keep a pointer on where the 
replication should resume, and therefore the update log will not be 
cleaned up to this point.

The pointer on the source cluster is not persistent (maybe that could be 
something to implement). Therefore if the source cluster is restarted, 
the pointer will be lost, and buffer should be activated until the 
target cluster is up and running.

>
> Also i have the below configuration on the source cluster to synchronize
> the update logs.
> |   <||lst| |name||=||"updateLogSynchronizer"||>|
> |||<||str| |name||=||"schedule"||>1000</||str||>|
> |||</||lst||>|
> |
> |
> |Regarding the monitoring of the replication, i am planning to add a
> script to check the queue size, to make sure the disk is not full in
> case the target site is down and the transaction log size keeps growing
> on the source site.|
> |Is there any other recommended approach?|

The best is to use the monitoring api which provides some metrics on how 
the replication is going. In the cwiki [1], there are also some 
recommendations on how to monitor the system

[1] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62687462

Kind Regards
-- 
Renaud Delbru

> |
> |
> |Thanks again, your inputs were very helpful.|
>
> On Tue, Jun 14, 2016 at 7:10 PM, Bharath Kumar
> <bharath.mvkumar@gmail.com <ma...@gmail.com>> wrote:
>
>     Hi Renaud,
>
>     Thank you so much for your response. It is very helpful and it
>     helped me understand the need for turning on buffering.
>
>     Is it recommended to keep the buffering enabled all the time on the
>     source cluster? If the target cluster is up and running and the cdcr
>     is started, can i turn off the buffering on the source site?
>
>     As you have mentioned, the transaction logs are kept on the source
>     cluster, until the data is replicated on the target cluster, once
>     the cdcr is started, is there a possibility that if on the target
>     cluster
>
>
>
>     On Tue, Jun 14, 2016 at 6:50 AM, Davis, Daniel (NIH/NLM) [C]
>     <daniel.davis@nih.gov <ma...@nih.gov>> wrote:
>
>         I must chime in to clarify something - in case 2, would the
>         source cluster eventually start a log reader on its own?   That
>         is, would the CDCR heal over time, or would manual action be
>         required?
>
>         -----Original Message-----
>         From: Renaud Delbru [mailto:renaud@siren.solutions
>         <ma...@siren.solutions>]
>         Sent: Tuesday, June 14, 2016 4:51 AM
>         To: solr-user@lucene.apache.org <ma...@lucene.apache.org>
>         Subject: Re: Regarding CDCR SOLR 6
>
>         Hi Bharath,
>
>         The buffer is useful when you need to buffer updates on the
>         source cluster before starting cdcr, if the source cluster might
>         receive updates in the meanwhile and you want to be sure to not
>         miss them.
>
>         To understand this better, you need to understand how cdcr clean
>         transaction logs. Cdcr when started (with the START action) will
>         instantiate a log reader for each target cluster. The position
>         of the log reader will indicate cdcr which transaction logs it
>         can clean. If all the log readers are beyond a certain point,
>         then cdcr can clean all the transaction logs up to this point.
>
>         However, there might be cases when the source cluster will be up
>         without any log readers instantiated:
>         1) The source cluster is started, but cdcr is not started yet
>         2) the source cluster is started, cdcr is started, but the
>         target cluster was not accessible when cdcr was started. In this
>         case, cdcr will not be able to instantiate a log reader for this
>         cluster.
>
>         In these two scenarios, if updates are received by the source
>         cluster, then they might be cleaned out from the transaction log
>         as per the normal update log cleaning procedure.
>         That is where the buffer becomes useful. When you know that
>         while starting up your clusters and cdcr, you will be in one of
>         these two scenarios, then you can activate the buffer to be sure
>         to not miss updates. Then when the source and target clusters
>         are properly up and cdcr replication is properly started, you
>         can turn off this buffer.
>
>         --
>         Renaud Delbru
>
>         On 14/06/16 06:41, Bharath Kumar wrote:
>          > Hi,
>          >
>          > I have setup cross data center replication using solr 6, i
>         want to
>          > know why the buffer needs to be enabled on the source
>         cluster? Even if
>          > the buffer is not enabled, i am able to replicate the data
>         between
>          > source and target sites. What is the advantages of enabling
>         the buffer
>          > on the source site? If i enable the buffer, the transaction
>         logs are
>          > never deleted and over a period of time we are running out of
>         disk.
>          > Can you please let me know why the buffer enabling is required?
>          >
>
>
>
>
>     --
>     Thanks & Regards,
>     Bharath MV Kumar
>
>     "Life is short, enjoy every moment of it"
>
>
>
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"


Re: Regarding CDCR SOLR 6

Posted by Bharath Kumar <bh...@gmail.com>.
Hi Renaud,

Thank you so much for your response. It is very helpful and it helped me
understand the need for turning on buffering.

Is it recommended to keep the buffering enabled all the time on the source
cluster? If the target cluster is up and running and the cdcr is started,
can i turn off the buffering on the source site?

As you have mentioned, the transaction logs are kept on the source cluster,
until the data is replicated on the target cluster, once the cdcr is
started. Is there a possibility that target cluster is out of sync with the
source cluster and we need to do a hard recovery from the source cluster to
sync up the target cluster?

Also i have the below configuration on the source cluster to synchronize
the update logs.
   <lst name="updateLogSynchronizer">
    <str name="schedule">1000</str>
  </lst>

Regarding the monitoring of the replication, i am planning to add a script
to check the queue size, to make sure the disk is not full in case the
target site is down and the transaction log size keeps growing on the
source site.
Is there any other recommended approach?

Thanks again, your inputs were very helpful.

On Tue, Jun 14, 2016 at 7:10 PM, Bharath Kumar <bh...@gmail.com>
wrote:

> Hi Renaud,
>
> Thank you so much for your response. It is very helpful and it helped me
> understand the need for turning on buffering.
>
> Is it recommended to keep the buffering enabled all the time on the source
> cluster? If the target cluster is up and running and the cdcr is started,
> can i turn off the buffering on the source site?
>
> As you have mentioned, the transaction logs are kept on the source
> cluster, until the data is replicated on the target cluster, once the cdcr
> is started, is there a possibility that if on the target cluster
>
>
>
> On Tue, Jun 14, 2016 at 6:50 AM, Davis, Daniel (NIH/NLM) [C] <
> daniel.davis@nih.gov> wrote:
>
>> I must chime in to clarify something - in case 2, would the source
>> cluster eventually start a log reader on its own?   That is, would the CDCR
>> heal over time, or would manual action be required?
>>
>> -----Original Message-----
>> From: Renaud Delbru [mailto:renaud@siren.solutions]
>> Sent: Tuesday, June 14, 2016 4:51 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Regarding CDCR SOLR 6
>>
>> Hi Bharath,
>>
>> The buffer is useful when you need to buffer updates on the source
>> cluster before starting cdcr, if the source cluster might receive updates
>> in the meanwhile and you want to be sure to not miss them.
>>
>> To understand this better, you need to understand how cdcr clean
>> transaction logs. Cdcr when started (with the START action) will
>> instantiate a log reader for each target cluster. The position of the log
>> reader will indicate cdcr which transaction logs it can clean. If all the
>> log readers are beyond a certain point, then cdcr can clean all the
>> transaction logs up to this point.
>>
>> However, there might be cases when the source cluster will be up without
>> any log readers instantiated:
>> 1) The source cluster is started, but cdcr is not started yet
>> 2) the source cluster is started, cdcr is started, but the target cluster
>> was not accessible when cdcr was started. In this case, cdcr will not be
>> able to instantiate a log reader for this cluster.
>>
>> In these two scenarios, if updates are received by the source cluster,
>> then they might be cleaned out from the transaction log as per the normal
>> update log cleaning procedure.
>> That is where the buffer becomes useful. When you know that while
>> starting up your clusters and cdcr, you will be in one of these two
>> scenarios, then you can activate the buffer to be sure to not miss updates.
>> Then when the source and target clusters are properly up and cdcr
>> replication is properly started, you can turn off this buffer.
>>
>> --
>> Renaud Delbru
>>
>> On 14/06/16 06:41, Bharath Kumar wrote:
>> > Hi,
>> >
>> > I have setup cross data center replication using solr 6, i want to
>> > know why the buffer needs to be enabled on the source cluster? Even if
>> > the buffer is not enabled, i am able to replicate the data between
>> > source and target sites. What is the advantages of enabling the buffer
>> > on the source site? If i enable the buffer, the transaction logs are
>> > never deleted and over a period of time we are running out of disk.
>> > Can you please let me know why the buffer enabling is required?
>> >
>>
>>
>
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Re: Regarding CDCR SOLR 6

Posted by Bharath Kumar <bh...@gmail.com>.
Hi Renaud,

Thank you so much for your response. It is very helpful and it helped me
understand the need for turning on buffering.

Is it recommended to keep the buffering enabled all the time on the source
cluster? If the target cluster is up and running and the cdcr is started,
can i turn off the buffering on the source site?

As you have mentioned, the transaction logs are kept on the source cluster,
until the data is replicated on the target cluster, once the cdcr is
started, is there a possibility that if on the target cluster



On Tue, Jun 14, 2016 at 6:50 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.davis@nih.gov> wrote:

> I must chime in to clarify something - in case 2, would the source cluster
> eventually start a log reader on its own?   That is, would the CDCR heal
> over time, or would manual action be required?
>
> -----Original Message-----
> From: Renaud Delbru [mailto:renaud@siren.solutions]
> Sent: Tuesday, June 14, 2016 4:51 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Regarding CDCR SOLR 6
>
> Hi Bharath,
>
> The buffer is useful when you need to buffer updates on the source cluster
> before starting cdcr, if the source cluster might receive updates in the
> meanwhile and you want to be sure to not miss them.
>
> To understand this better, you need to understand how cdcr clean
> transaction logs. Cdcr when started (with the START action) will
> instantiate a log reader for each target cluster. The position of the log
> reader will indicate cdcr which transaction logs it can clean. If all the
> log readers are beyond a certain point, then cdcr can clean all the
> transaction logs up to this point.
>
> However, there might be cases when the source cluster will be up without
> any log readers instantiated:
> 1) The source cluster is started, but cdcr is not started yet
> 2) the source cluster is started, cdcr is started, but the target cluster
> was not accessible when cdcr was started. In this case, cdcr will not be
> able to instantiate a log reader for this cluster.
>
> In these two scenarios, if updates are received by the source cluster,
> then they might be cleaned out from the transaction log as per the normal
> update log cleaning procedure.
> That is where the buffer becomes useful. When you know that while starting
> up your clusters and cdcr, you will be in one of these two scenarios, then
> you can activate the buffer to be sure to not miss updates. Then when the
> source and target clusters are properly up and cdcr replication is properly
> started, you can turn off this buffer.
>
> --
> Renaud Delbru
>
> On 14/06/16 06:41, Bharath Kumar wrote:
> > Hi,
> >
> > I have setup cross data center replication using solr 6, i want to
> > know why the buffer needs to be enabled on the source cluster? Even if
> > the buffer is not enabled, i am able to replicate the data between
> > source and target sites. What is the advantages of enabling the buffer
> > on the source site? If i enable the buffer, the transaction logs are
> > never deleted and over a period of time we are running out of disk.
> > Can you please let me know why the buffer enabling is required?
> >
>
>


-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

RE: Regarding CDCR SOLR 6

Posted by "Davis, Daniel (NIH/NLM) [C]" <da...@nih.gov>.
I must chime in to clarify something - in case 2, would the source cluster eventually start a log reader on its own?   That is, would the CDCR heal over time, or would manual action be required?

-----Original Message-----
From: Renaud Delbru [mailto:renaud@siren.solutions] 
Sent: Tuesday, June 14, 2016 4:51 AM
To: solr-user@lucene.apache.org
Subject: Re: Regarding CDCR SOLR 6

Hi Bharath,

The buffer is useful when you need to buffer updates on the source cluster before starting cdcr, if the source cluster might receive updates in the meanwhile and you want to be sure to not miss them.

To understand this better, you need to understand how cdcr clean transaction logs. Cdcr when started (with the START action) will instantiate a log reader for each target cluster. The position of the log reader will indicate cdcr which transaction logs it can clean. If all the log readers are beyond a certain point, then cdcr can clean all the transaction logs up to this point.

However, there might be cases when the source cluster will be up without any log readers instantiated:
1) The source cluster is started, but cdcr is not started yet
2) the source cluster is started, cdcr is started, but the target cluster was not accessible when cdcr was started. In this case, cdcr will not be able to instantiate a log reader for this cluster.

In these two scenarios, if updates are received by the source cluster, then they might be cleaned out from the transaction log as per the normal update log cleaning procedure.
That is where the buffer becomes useful. When you know that while starting up your clusters and cdcr, you will be in one of these two scenarios, then you can activate the buffer to be sure to not miss updates. Then when the source and target clusters are properly up and cdcr replication is properly started, you can turn off this buffer.

--
Renaud Delbru

On 14/06/16 06:41, Bharath Kumar wrote:
> Hi,
>
> I have setup cross data center replication using solr 6, i want to 
> know why the buffer needs to be enabled on the source cluster? Even if 
> the buffer is not enabled, i am able to replicate the data between 
> source and target sites. What is the advantages of enabling the buffer 
> on the source site? If i enable the buffer, the transaction logs are 
> never deleted and over a period of time we are running out of disk. 
> Can you please let me know why the buffer enabling is required?
>


Re: Regarding CDCR SOLR 6

Posted by Renaud Delbru <re...@siren.solutions>.
Hi Bharath,

The buffer is useful when you need to buffer updates on the source 
cluster before starting cdcr, if the source cluster might receive 
updates in the meanwhile and you want to be sure to not miss them.

To understand this better, you need to understand how cdcr clean 
transaction logs. Cdcr when started (with the START action) will 
instantiate a log reader for each target cluster. The position of the 
log reader will indicate cdcr which transaction logs it can clean. If 
all the log readers are beyond a certain point, then cdcr can clean all 
the transaction logs up to this point.

However, there might be cases when the source cluster will be up without 
any log readers instantiated:
1) The source cluster is started, but cdcr is not started yet
2) the source cluster is started, cdcr is started, but the target 
cluster was not accessible when cdcr was started. In this case, cdcr 
will not be able to instantiate a log reader for this cluster.

In these two scenarios, if updates are received by the source cluster, 
then they might be cleaned out from the transaction log as per the 
normal update log cleaning procedure.
That is where the buffer becomes useful. When you know that while 
starting up your clusters and cdcr, you will be in one of these two 
scenarios, then you can activate the buffer to be sure to not miss 
updates. Then when the source and target clusters are properly up and 
cdcr replication is properly started, you can turn off this buffer.

-- 
Renaud Delbru

On 14/06/16 06:41, Bharath Kumar wrote:
> Hi,
>
> I have setup cross data center replication using solr 6, i want to know why
> the buffer needs to be enabled on the source cluster? Even if the buffer is
> not enabled, i am able to replicate the data between source and target
> sites. What is the advantages of enabling the buffer on the source site? If
> i enable the buffer, the transaction logs are never deleted and over a
> period of time we are running out of disk. Can you please let me know why
> the buffer enabling is required?
>