You are viewing a plain text version of this content. The canonical link for it is here.

Posted to oak-dev@jackrabbit.apache.org by Ian Boston <ia...@gmail.com> on 2017/05/04 10:02:47 UTC

MongoMK failover behaviour.

Hi,
What is the expected behaviour when a Oak MongoMK experiences a MongoDB
primary failure.

I am looking at an instance that appears to try and retry reads repeatedly
from the MongoDB primary and after 60s or more reports the Oak Discovery
lease has been lost, resulting in many minutes of retries there eventually
shutting down the repository.

I don't currently have the MongoDB logs to share. Just wondering what to
expect at this stage ?

I am starting from the assumption that Oak works perfectly in this regard.
Best Regards
Ian

Re: MongoMK failover behaviour.

Posted by Stefan Egli <st...@apache.org>.

Hi,

On 04/05/17 16:56, "Justin Edelson" <ju...@justinedelson.com> wrote:

>>Hmm, depending on the Oak version, this may also be caused by OAK-5528.
>> The current fix versions are 1.4.15 and 1.6.0.
>>
>
>Would this show up in thread dumps? Based on the description, it seems
>like
>it should.

Not necessarily. In OAK-5528 the lease update thread goes into
performLeaseCheck which will do a 5x1sec retry loop. So if the thread dump
is taken during that time one would see it - if taken afterwards not.

Cheers,
Stefan

Re: MongoMK failover behaviour.

Posted by Justin Edelson <ju...@justinedelson.com>.

Hi,

On Thu, May 4, 2017 at 10:46 AM Marcel Reutegger <mr...@adobe.com> wrote:

> Hi,
>
> On 04/05/17 16:35, Ian Boston wrote:
> > Looks like there might be a problem with the MongoDB deployment in the
> case
> > I am looking at. Either due to performance or misconfiguration. Dropping
> a
> > primary results in read queries failing and after 120s the Oak
> repositories
> > shutdown as they are not able to write. All that points to the MongoDB
> > driver config, or the MongoDB instances and not Oak.
>
> Hmm, depending on the Oak version, this may also be caused by OAK-5528.
> The current fix versions are 1.4.15 and 1.6.0.
>

Would this show up in thread dumps? Based on the description, it seems like
it should.

Regards,
Justin




>
> Regards
>   Marcel
>

Re: MongoMK failover behaviour.

Posted by Marcel Reutegger <mr...@adobe.com>.

Hi,

On 04/05/17 16:35, Ian Boston wrote:
> Looks like there might be a problem with the MongoDB deployment in the case
> I am looking at. Either due to performance or misconfiguration. Dropping a
> primary results in read queries failing and after 120s the Oak repositories
> shutdown as they are not able to write. All that points to the MongoDB
> driver config, or the MongoDB instances and not Oak.

Hmm, depending on the Oak version, this may also be caused by OAK-5528. 
The current fix versions are 1.4.15 and 1.6.0.

Regards
  Marcel

Re: MongoMK failover behaviour.

Posted by Ian Boston <ie...@tfd.co.uk>.

Hi,

On 4 May 2017 at 15:19, Marcel Reutegger <mr...@adobe.com> wrote:

> Hi,
>
> On 04/05/17 14:57, Ian Boston wrote:
>
>> Before 120 seconds, should the MongoDB Java driver route read queries to a
>> secondary and use the new primary without any action by Oak (eg closing a
>> connection and opening a new one ) ?
>>
>
> Yes, the MongoDB Java driver automatically routes queries based on their
> required read preference. The failover is automatic and the driver should
> direct queries to the new primary once available. Connection pooling is
> done by the driver. Oak does not manage those.
>

Thanks.
Looks like there might be a problem with the MongoDB deployment in the case
I am looking at. Either due to performance or misconfiguration. Dropping a
primary results in read queries failing and after 120s the Oak repositories
shutdown as they are not able to write. All that points to the MongoDB
driver config, or the MongoDB instances and not Oak.

Best Regards
Ian

>
> Regards
>  Marcel
>

Re: MongoMK failover behaviour.

Posted by Marcel Reutegger <mr...@adobe.com>.

Hi,

On 04/05/17 14:57, Ian Boston wrote:
> Before 120 seconds, should the MongoDB Java driver route read queries to a
> secondary and use the new primary without any action by Oak (eg closing a
> connection and opening a new one ) ?

Yes, the MongoDB Java driver automatically routes queries based on their 
required read preference. The failover is automatic and the driver 
should direct queries to the new primary once available. Connection 
pooling is done by the driver. Oak does not manage those.

Regards
  Marcel

Re: MongoMK failover behaviour.

Posted by Ian Boston <ie...@tfd.co.uk>.

Hi,

On 4 May 2017 at 11:26, Marcel Reutegger <mr...@adobe.com> wrote:

> Hi,
>
> On 04/05/17 12:02, Ian Boston wrote:
>
>> What is the expected behaviour when a Oak MongoMK experiences a MongoDB
>> primary failure.
>>
>> I am looking at an instance that appears to try and retry reads repeatedly
>> from the MongoDB primary and after 60s or more reports the Oak Discovery
>> lease has been lost, resulting in many minutes of retries there eventually
>> shutting down the repository.
>>
>> I don't currently have the MongoDB logs to share. Just wondering what to
>> expect at this stage ?
>>
>
> Oak will stop the oak-core bundle if a MongoDB primary is unavailable for
> more than 110 seconds. The 110 seconds are based on the 120 seconds lease
> timeout and a lease update interval of 10 seconds.
>

Yes, that happens after 120s.


Before 120 seconds, should the MongoDB Java driver route read queries to a
secondary and use the new primary without any action by Oak (eg closing a
connection and opening a new one ) ?

Best Regards
Ian


>
> When this happens, all reads and writes to the repository will fail.
> Though, in an OSGi environment services depending on oak-core should stop
> as well. You will need to restart the system or affected bundles once the
> primary is available again. See also discussion in OAK-3397 and OAK-3250.
>
> Regards
>  Marcel
>

Re: MongoMK failover behaviour.

Posted by Marcel Reutegger <mr...@adobe.com>.

Hi,

On 04/05/17 12:02, Ian Boston wrote:
> What is the expected behaviour when a Oak MongoMK experiences a MongoDB
> primary failure.
>
> I am looking at an instance that appears to try and retry reads repeatedly
> from the MongoDB primary and after 60s or more reports the Oak Discovery
> lease has been lost, resulting in many minutes of retries there eventually
> shutting down the repository.
>
> I don't currently have the MongoDB logs to share. Just wondering what to
> expect at this stage ?

Oak will stop the oak-core bundle if a MongoDB primary is unavailable 
for more than 110 seconds. The 110 seconds are based on the 120 seconds 
lease timeout and a lease update interval of 10 seconds.

When this happens, all reads and writes to the repository will fail. 
Though, in an OSGi environment services depending on oak-core should 
stop as well. You will need to restart the system or affected bundles 
once the primary is available again. See also discussion in OAK-3397 and 
OAK-3250.

Regards
  Marcel