You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-dev@jackrabbit.apache.org by Mostafa Mahdieh <m....@gmail.com> on 2017/08/01 13:45:45 UTC

single node cluster

Hi,

I'm using jackrabbit oak as the content repository of a document management
system product. Currently there is no need to scale out, therefore I'm
using jackrabbit oak in a single node environment. However, I'm
experiencing issues related to clustering and lease time, such as the
following exception which is appearing all over my tomcat logs:

WARN: Background operation failed:
org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: This oak
instance failed to update the lease in time and can therefore no longer
access this DocumentNodeStore.

After some research, It seems that there is no way to use jackrabbit oak
forcing it to use a single node and not having any concerns related to
clustering.

Am I using the right tool? I thought maybe jackrabbit 2 might be better for
my current use case, however oak seemed as the future of jackrabbit, and
attracted me (adding scalability is also in my future vision). Do you
suggest oak for my usecase or jackrabbit 2? How can I adapt oak for a
single node environment without getting issues regarding lease time and
clustering?

Best Regards
-- 
Mostafa Mahdieh

Re: single node cluster

Posted by Mostafa Mahdieh <m....@gmail.com>.
I am using oak in a java ee container (tomcat). Each session is created per
thread (i.e. per HTTP request). Is there anything else I can provide to
track down the issue?


On Tue, Aug 1, 2017 at 7:59 PM, Mostafa Mahdieh <m....@gmail.com> wrote:

> Thanks. I'm using MongoDB, the DB is right on the machine, which seems
> reliable enough (and I see no database connection exceptions). File system
> seems like a good option for me, however the transparency and visibility of
> using MongoDB were the reasons of using it. It seems that the database
> connection must not be the issue here.
>
> On Tue, Aug 1, 2017 at 6:51 PM, Julian Reschke <ju...@gmx.de>
> wrote:
>
>> On 2017-08-01 15:45, Mostafa Mahdieh wrote:
>>
>>> Hi,
>>>
>>> I'm using jackrabbit oak as the content repository of a document
>>> management
>>> system product. Currently there is no need to scale out, therefore I'm
>>> using jackrabbit oak in a single node environment. However, I'm
>>> experiencing issues related to clustering and lease time, such as the
>>> following exception which is appearing all over my tomcat logs:
>>>
>>> WARN: Background operation failed:
>>> org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: This
>>> oak
>>> instance failed to update the lease in time and can therefore no longer
>>> access this DocumentNodeStore.
>>>
>>> After some research, It seems that there is no way to use jackrabbit oak
>>> forcing it to use a single node and not having any concerns related to
>>> clustering.
>>> ...
>>>
>>
>> Well, a single node is just a very small cluster :-).
>>
>> If you see the message above, apparently your connection to the
>> persistence (is it Mongo or RDB?) isn't reliable, or the DB itself is
>> flaky. The log files should show you more.
>>
>> That said, you can use Oak with filesytem-based persistence as well, in
>> which case this class of issues would go away (but then you'd need to
>> migrate the persistence if later you decide you need to scale to a real
>> cluster)
>>
>> Best regards, Julian
>>
>>
>
>
> --
> Mostafa Mahdieh
>



-- 
Mostafa Mahdieh

Re: single node cluster

Posted by Mostafa Mahdieh <m....@gmail.com>.
Thanks. I'm using MongoDB, the DB is right on the machine, which seems
reliable enough (and I see no database connection exceptions). File system
seems like a good option for me, however the transparency and visibility of
using MongoDB were the reasons of using it. It seems that the database
connection must not be the issue here.

On Tue, Aug 1, 2017 at 6:51 PM, Julian Reschke <ju...@gmx.de>
wrote:

> On 2017-08-01 15:45, Mostafa Mahdieh wrote:
>
>> Hi,
>>
>> I'm using jackrabbit oak as the content repository of a document
>> management
>> system product. Currently there is no need to scale out, therefore I'm
>> using jackrabbit oak in a single node environment. However, I'm
>> experiencing issues related to clustering and lease time, such as the
>> following exception which is appearing all over my tomcat logs:
>>
>> WARN: Background operation failed:
>> org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: This
>> oak
>> instance failed to update the lease in time and can therefore no longer
>> access this DocumentNodeStore.
>>
>> After some research, It seems that there is no way to use jackrabbit oak
>> forcing it to use a single node and not having any concerns related to
>> clustering.
>> ...
>>
>
> Well, a single node is just a very small cluster :-).
>
> If you see the message above, apparently your connection to the
> persistence (is it Mongo or RDB?) isn't reliable, or the DB itself is
> flaky. The log files should show you more.
>
> That said, you can use Oak with filesytem-based persistence as well, in
> which case this class of issues would go away (but then you'd need to
> migrate the persistence if later you decide you need to scale to a real
> cluster)
>
> Best regards, Julian
>
>


-- 
Mostafa Mahdieh

Re: single node cluster

Posted by Julian Reschke <ju...@gmx.de>.
On 2017-08-01 15:45, Mostafa Mahdieh wrote:
> Hi,
> 
> I'm using jackrabbit oak as the content repository of a document management
> system product. Currently there is no need to scale out, therefore I'm
> using jackrabbit oak in a single node environment. However, I'm
> experiencing issues related to clustering and lease time, such as the
> following exception which is appearing all over my tomcat logs:
> 
> WARN: Background operation failed:
> org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: This oak
> instance failed to update the lease in time and can therefore no longer
> access this DocumentNodeStore.
> 
> After some research, It seems that there is no way to use jackrabbit oak
> forcing it to use a single node and not having any concerns related to
> clustering.
> ...

Well, a single node is just a very small cluster :-).

If you see the message above, apparently your connection to the 
persistence (is it Mongo or RDB?) isn't reliable, or the DB itself is 
flaky. The log files should show you more.

That said, you can use Oak with filesytem-based persistence as well, in 
which case this class of issues would go away (but then you'd need to 
migrate the persistence if later you decide you need to scale to a real 
cluster)

Best regards, Julian


Re: single node cluster

Posted by Mostafa Mahdieh <m....@gmail.com>.
I feel that I need to understand the underlying design thought of
clustering and lease, and I can't find enough documentation around that
topic.

On Sun, Aug 6, 2017 at 5:18 PM, Mostafa Mahdieh <m....@gmail.com> wrote:

> There is a comment here (https://issues.apache.org/jira/browse/OAK-3424)
> from Marcel Reutegger
> <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mreutegg>,
> suggesting that:
>
> *"I my view we should change the current behavior and prohibit a
> deployment where two cluster nodes are running with the same working
> directory and an > automatic clusterId is requested."*
>
> It seems that this behavior matches my scenario. How can I configure the
> oak connection such that this happens?
>
> On Sun, Aug 6, 2017 at 4:37 PM, Julian Reschke <ju...@gmx.de>
> wrote:
>
>> On 2017-08-06 09:08, Mostafa Mahdieh wrote:
>>
>>> Thanks for your suggestions.
>>>
>>> I'm wondering what happens in the worst case if I disable the lease
>>> check.
>>> ...
>>>
>>
>> Repository corruption.
>>
>> What you need to find out is why the lease renewal fails, not how to turn
>> it off. :-)
>>
>
>
>
> --
> Mostafa Mahdieh
>



-- 
Mostafa Mahdieh

Re: single node cluster

Posted by Julian Reschke <ju...@gmx.de>.
On 2017-08-07 18:48, Mostafa Mahdieh wrote:
> OK, that was just an example. There are other reasonable scenarios, such as
> the database temporarily being unavailable for 2-3 minutes. How can I
> recover the connection in that situation.

In that case, you'll have to restart Oak.

Best regards, Julian

Re: single node cluster

Posted by Mostafa Mahdieh <m....@gmail.com>.
OK, that was just an example. There are other reasonable scenarios, such as
the database temporarily being unavailable for 2-3 minutes. How can I
recover the connection in that situation.

On Mon, Aug 7, 2017 at 7:37 PM, Julian Reschke <ju...@gmx.de>
wrote:

> On 2017-08-07 16:11, Mostafa Mahdieh wrote:
>
>> Consider that for some reason the session lease is expired and cannot be
>> used. For example one scenario is that the the cluster node is removed from
>> the clusterNodes collection (I couldn't find out all scenarios that
>>
>
> That's not supposed to happen at all.
>
> ...
>>
>
> Best regards, Julian
>



-- 
Mostafa Mahdieh

Re: single node cluster

Posted by Julian Reschke <ju...@gmx.de>.
On 2017-08-07 16:11, Mostafa Mahdieh wrote:
> Consider that for some reason the session lease is expired and cannot be 
> used. For example one scenario is that the the cluster node is removed 
> from the clusterNodes collection (I couldn't find out all scenarios that 

That's not supposed to happen at all.

> ...

Best regards, Julian

Re: single node cluster

Posted by Mostafa Mahdieh <m....@gmail.com>.
Consider that for some reason the session lease is expired and cannot be
used. For example one scenario is that the the cluster node is removed from
the clusterNodes collection (I couldn't find out all scenarios that this
happens). What happens is that the background lease update thread finds
this out and starts writing out exceptions. If we could somehow detect this
status (e.g. with an observer), and reconnect the connection, I could
recover the application from this state again. Is this possible? I'm using
jackrabbit oak 1.6.1.

Regards

On Mon, Aug 7, 2017 at 6:30 PM, Mostafa Mahdieh <m....@gmail.com> wrote:

> Can anyone please help me on Marcel's comment? Is it already implemented?
> I really need your help.
>
> On Sun, Aug 6, 2017 at 5:18 PM, Mostafa Mahdieh <m....@gmail.com>
> wrote:
>
>> There is a comment here (https://issues.apache.org/jira/browse/OAK-3424)
>> from Marcel Reutegger
>> <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mreutegg>,
>> suggesting that:
>>
>> *"I my view we should change the current behavior and prohibit a
>> deployment where two cluster nodes are running with the same working
>> directory and an > automatic clusterId is requested."*
>>
>> It seems that this behavior matches my scenario. How can I configure the
>> oak connection such that this happens?
>>
>> On Sun, Aug 6, 2017 at 4:37 PM, Julian Reschke <ju...@gmx.de>
>> wrote:
>>
>>> On 2017-08-06 09:08, Mostafa Mahdieh wrote:
>>>
>>>> Thanks for your suggestions.
>>>>
>>>> I'm wondering what happens in the worst case if I disable the lease
>>>> check.
>>>> ...
>>>>
>>>
>>> Repository corruption.
>>>
>>> What you need to find out is why the lease renewal fails, not how to
>>> turn it off. :-)
>>>
>>
>>
>>
>> --
>> Mostafa Mahdieh
>>
>
>
>
> --
> Mostafa Mahdieh
>



-- 
Mostafa Mahdieh

Re: single node cluster

Posted by Julian Reschke <ju...@gmx.de>.
On 2017-08-07 16:00, Mostafa Mahdieh wrote:
> Can anyone please help me on Marcel's comment? Is it already implemented?
> I really need your help.
> ...

See:

> Fix Version/s:
> 1.2.10, 1.3.13, 1.4 


So, yes this has been changed long ago.

Once again: what you should try to find out is why the lease update 
fails on your machine.

Best regards, Julian

Re: single node cluster

Posted by Mostafa Mahdieh <m....@gmail.com>.
Can anyone please help me on Marcel's comment? Is it already implemented?
I really need your help.

On Sun, Aug 6, 2017 at 5:18 PM, Mostafa Mahdieh <m....@gmail.com> wrote:

> There is a comment here (https://issues.apache.org/jira/browse/OAK-3424)
> from Marcel Reutegger
> <https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mreutegg>,
> suggesting that:
>
> *"I my view we should change the current behavior and prohibit a
> deployment where two cluster nodes are running with the same working
> directory and an > automatic clusterId is requested."*
>
> It seems that this behavior matches my scenario. How can I configure the
> oak connection such that this happens?
>
> On Sun, Aug 6, 2017 at 4:37 PM, Julian Reschke <ju...@gmx.de>
> wrote:
>
>> On 2017-08-06 09:08, Mostafa Mahdieh wrote:
>>
>>> Thanks for your suggestions.
>>>
>>> I'm wondering what happens in the worst case if I disable the lease
>>> check.
>>> ...
>>>
>>
>> Repository corruption.
>>
>> What you need to find out is why the lease renewal fails, not how to turn
>> it off. :-)
>>
>
>
>
> --
> Mostafa Mahdieh
>



-- 
Mostafa Mahdieh

Re: single node cluster

Posted by Mostafa Mahdieh <m....@gmail.com>.
There is a comment here (https://issues.apache.org/jira/browse/OAK-3424)
from Marcel Reutegger
<https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mreutegg>,
suggesting that:

*"I my view we should change the current behavior and prohibit a deployment
where two cluster nodes are running with the same working directory and an
> automatic clusterId is requested."*

It seems that this behavior matches my scenario. How can I configure the
oak connection such that this happens?

On Sun, Aug 6, 2017 at 4:37 PM, Julian Reschke <ju...@gmx.de>
wrote:

> On 2017-08-06 09:08, Mostafa Mahdieh wrote:
>
>> Thanks for your suggestions.
>>
>> I'm wondering what happens in the worst case if I disable the lease check.
>> ...
>>
>
> Repository corruption.
>
> What you need to find out is why the lease renewal fails, not how to turn
> it off. :-)
>



-- 
Mostafa Mahdieh

Re: single node cluster

Posted by Julian Reschke <ju...@gmx.de>.
On 2017-08-06 09:08, Mostafa Mahdieh wrote:
> Thanks for your suggestions.
> 
> I'm wondering what happens in the worst case if I disable the lease check.
> ...

Repository corruption.

What you need to find out is why the lease renewal fails, not how to 
turn it off. :-)

Re: single node cluster

Posted by Mostafa Mahdieh <m....@gmail.com>.
I just don't understand completely all the design thoughts behind the lease
time idea. I've read the conversations here, but still its not all clear.

https://issues.apache.org/jira/browse/OAK-3424
https://issues.apache.org/jira/browse/OAK-3449
https://issues.apache.org/jira/browse/OAK-2739

I'm hardly trying to understand whats going on from the source code. 😑😥

On Sun, Aug 6, 2017 at 11:38 AM, Mostafa Mahdieh <m....@gmail.com>
wrote:

> Thanks for your suggestions.
>
> I'm wondering what happens in the worst case if I disable the lease check.
>
> In case multiple instances of jackrabbit are run using the same clusterId,
> what are the consequences? Is this all related to logging issues or does it
> have any worse consequences?
>
> On Wed, Aug 2, 2017 at 1:54 PM, Stefan Egli <st...@apache.org> wrote:
>
>> Hi Mostafa,
>>
>> I'd suggest to narrow down why that lease update failed, esp if you have
>> it reproducible. By default a lease is updated every 10 seconds and is
>> valid for 2min (and could in theory be changed but that's not recommended
>> necessarily).
>>
>> Besides mentioned DB issues, other cases where lease updates failed were
>> JVMs running low on memory thus doing too long GC-stop-the-worlds.
>>
>> If you can rule out both, then here's some more ideas to investigate:
>>
>> a) check for warnings in the form of: "BackgroundLeaseUpdate.execute:
>> time
>> since last renewClusterIdLease() call longer than expected" to see if the
>> lease update became slow already before it finally expired. Perhaps that
>> gives some clues already.
>>
>> b) enable trace logging for
>> 'org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo' to see all
>> details about lease updates happening (or not).
>>
>> c) analyse thread dumps to rule out blocked lease update thread
>>
>> Cheers,
>> Stefan
>>
>> On 01/08/17 15:45, "Mostafa Mahdieh" <m....@gmail.com> wrote:
>>
>> >Hi,
>> >
>> >I'm using jackrabbit oak as the content repository of a document
>> >management
>> >system product. Currently there is no need to scale out, therefore I'm
>> >using jackrabbit oak in a single node environment. However, I'm
>> >experiencing issues related to clustering and lease time, such as the
>> >following exception which is appearing all over my tomcat logs:
>> >
>> >WARN: Background operation failed:
>> >org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: This
>> >oak
>> >instance failed to update the lease in time and can therefore no longer
>> >access this DocumentNodeStore.
>> >
>> >After some research, It seems that there is no way to use jackrabbit oak
>> >forcing it to use a single node and not having any concerns related to
>> >clustering.
>> >
>> >Am I using the right tool? I thought maybe jackrabbit 2 might be better
>> >for
>> >my current use case, however oak seemed as the future of jackrabbit, and
>> >attracted me (adding scalability is also in my future vision). Do you
>> >suggest oak for my usecase or jackrabbit 2? How can I adapt oak for a
>> >single node environment without getting issues regarding lease time and
>> >clustering?
>> >
>> >Best Regards
>> >--
>> >Mostafa Mahdieh
>>
>>
>>
>
>
> --
> Mostafa Mahdieh
>



-- 
Mostafa Mahdieh

Re: single node cluster

Posted by Mostafa Mahdieh <m....@gmail.com>.
Thanks for your suggestions.

I'm wondering what happens in the worst case if I disable the lease check.

In case multiple instances of jackrabbit are run using the same clusterId,
what are the consequences? Is this all related to logging issues or does it
have any worse consequences?

On Wed, Aug 2, 2017 at 1:54 PM, Stefan Egli <st...@apache.org> wrote:

> Hi Mostafa,
>
> I'd suggest to narrow down why that lease update failed, esp if you have
> it reproducible. By default a lease is updated every 10 seconds and is
> valid for 2min (and could in theory be changed but that's not recommended
> necessarily).
>
> Besides mentioned DB issues, other cases where lease updates failed were
> JVMs running low on memory thus doing too long GC-stop-the-worlds.
>
> If you can rule out both, then here's some more ideas to investigate:
>
> a) check for warnings in the form of: "BackgroundLeaseUpdate.execute: time
> since last renewClusterIdLease() call longer than expected" to see if the
> lease update became slow already before it finally expired. Perhaps that
> gives some clues already.
>
> b) enable trace logging for
> 'org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo' to see all
> details about lease updates happening (or not).
>
> c) analyse thread dumps to rule out blocked lease update thread
>
> Cheers,
> Stefan
>
> On 01/08/17 15:45, "Mostafa Mahdieh" <m....@gmail.com> wrote:
>
> >Hi,
> >
> >I'm using jackrabbit oak as the content repository of a document
> >management
> >system product. Currently there is no need to scale out, therefore I'm
> >using jackrabbit oak in a single node environment. However, I'm
> >experiencing issues related to clustering and lease time, such as the
> >following exception which is appearing all over my tomcat logs:
> >
> >WARN: Background operation failed:
> >org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: This
> >oak
> >instance failed to update the lease in time and can therefore no longer
> >access this DocumentNodeStore.
> >
> >After some research, It seems that there is no way to use jackrabbit oak
> >forcing it to use a single node and not having any concerns related to
> >clustering.
> >
> >Am I using the right tool? I thought maybe jackrabbit 2 might be better
> >for
> >my current use case, however oak seemed as the future of jackrabbit, and
> >attracted me (adding scalability is also in my future vision). Do you
> >suggest oak for my usecase or jackrabbit 2? How can I adapt oak for a
> >single node environment without getting issues regarding lease time and
> >clustering?
> >
> >Best Regards
> >--
> >Mostafa Mahdieh
>
>
>


-- 
Mostafa Mahdieh

Re: single node cluster

Posted by Stefan Egli <st...@apache.org>.
Hi Mostafa,

I'd suggest to narrow down why that lease update failed, esp if you have
it reproducible. By default a lease is updated every 10 seconds and is
valid for 2min (and could in theory be changed but that's not recommended
necessarily).

Besides mentioned DB issues, other cases where lease updates failed were
JVMs running low on memory thus doing too long GC-stop-the-worlds.

If you can rule out both, then here's some more ideas to investigate:

a) check for warnings in the form of: "BackgroundLeaseUpdate.execute: time
since last renewClusterIdLease() call longer than expected" to see if the
lease update became slow already before it finally expired. Perhaps that
gives some clues already.

b) enable trace logging for
'org.apache.jackrabbit.oak.plugins.document.ClusterNodeInfo' to see all
details about lease updates happening (or not).

c) analyse thread dumps to rule out blocked lease update thread

Cheers,
Stefan

On 01/08/17 15:45, "Mostafa Mahdieh" <m....@gmail.com> wrote:

>Hi,
>
>I'm using jackrabbit oak as the content repository of a document
>management
>system product. Currently there is no need to scale out, therefore I'm
>using jackrabbit oak in a single node environment. However, I'm
>experiencing issues related to clustering and lease time, such as the
>following exception which is appearing all over my tomcat logs:
>
>WARN: Background operation failed:
>org.apache.jackrabbit.oak.plugins.document.DocumentStoreException: This
>oak
>instance failed to update the lease in time and can therefore no longer
>access this DocumentNodeStore.
>
>After some research, It seems that there is no way to use jackrabbit oak
>forcing it to use a single node and not having any concerns related to
>clustering.
>
>Am I using the right tool? I thought maybe jackrabbit 2 might be better
>for
>my current use case, however oak seemed as the future of jackrabbit, and
>attracted me (adding scalability is also in my future vision). Do you
>suggest oak for my usecase or jackrabbit 2? How can I adapt oak for a
>single node environment without getting issues regarding lease time and
>clustering?
>
>Best Regards
>-- 
>Mostafa Mahdieh