You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Christian Stocker <ch...@liip.ch> on 2011/04/14 11:20:36 UTC

Add new nodes to a cluster

Hi

We're currently building a jackrabbit setup with quite some new content
every day. We also want to use clustering and beeing able to add new
instances should the need arise. Reading
http://wiki.apache.org/jackrabbit/Clustering that seems to be not that
easy (adding a completely new instance) without having to replay the
whole journal from the beginning (which can take ages, of course).

Is there an easy, recommended way to add new instances without having to
replay the whole journal? In the "Janitor" section, there's the remark

"If the janitor is enabled then you loose the possibility to easily add
cluster nodes. (It is still possible but takes detailed knowledge of
Jackrabbit.)"


Is this still the case? What "Detailed knowledge" do we need, to just
"clone" a running instance and add that to the cluster?

(using the janitor would certainly make sense in our use case)

Any hint is very appreciated

christian

-- 
Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE

Re: Add new nodes to a cluster

Posted by Markus Blaurock <bl...@dig.de>.

We are using jackrabbit in production running in a cluster.
The journal contains no "data", it is only used for keeping
the lucene-indexes in sync.
-> when adding a new node to the cluster you are building a new
lucene index from scratch. The journal is needed from the time on
 the new node  is added to the cluster.
Therefore: if it takes long the build a new index, make shure the
janitor does not delete journal-entries created after starting the new node.

We also were e a bit worried about this comment about
"easily adding cluster nodes" in the wiki, but we started neverthless.
And it worked, it all breaks down to having an index that fits your
persistent data.

Markus


Am 18.04.2011 11:04, schrieb Jeroen Reijn:
> I'm also not 100% sure, but I can second Alex his answer.
> >From what I've seen the new cluster node will start from the persisted data
> and will continue from there on with using the journal.
>
> Jeroen
>
> On Mon, Apr 18, 2011 at 10:52 AM, Michael Wechner <michael.wechner@wyona.com
>> wrote:
>> Hi
>>
>> I am not sure if you received an answer on this already, but it seems
>> somebody asked the
>> same some time ago
>>
>> http://web.archiveorange.com/archive/v/L1HBQddB3PYM2Fsde3ZL
>>
>> (with one not totally certain answer).
>>
>> On the other you might just give it a try and check what it will do.
>>
>> I would also be interested in your findings.
>>
>> Thanks
>>
>> Michael
>>
>>
>> On 4/14/11 11:20 AM, Christian Stocker wrote:
>>
>>> Hi
>>>
>>> We're currently building a jackrabbit setup with quite some new content
>>> every day. We also want to use clustering and beeing able to add new
>>> instances should the need arise. Reading
>>> http://wiki.apache.org/jackrabbit/Clustering that seems to be not that
>>> easy (adding a completely new instance) without having to replay the
>>> whole journal from the beginning (which can take ages, of course).
>>>
>>> Is there an easy, recommended way to add new instances without having to
>>> replay the whole journal? In the "Janitor" section, there's the remark
>>>
>>> "If the janitor is enabled then you loose the possibility to easily add
>>> cluster nodes. (It is still possible but takes detailed knowledge of
>>> Jackrabbit.)"
>>>
>>>
>>> Is this still the case? What "Detailed knowledge" do we need, to just
>>> "clone" a running instance and add that to the cluster?
>>>
>>> (using the janitor would certainly make sense in our use case)
>>>
>>> Any hint is very appreciated
>>>
>>> christian
>>>
>>>


mit freundlichen Grüßen
Markus Blaurock

-- 
DIG GmbH
Neckarstraße 1/5, 78727 Oberndorf am Neckar, Fon: +49 7423 8750 62
Registergericht: Amtsgericht Stuttgart HRB 480914
Geschäftsführer: Carsten Huber

Re: Add new nodes to a cluster

Posted by Christian Stocker <ch...@liip.ch>.

Hi

Ok, I did that blogpost:

http://blog.liip.ch/archive/2011/05/10/add-new-instances-to-your-jackrabbit-cluster-the-non-time-consuming-way.html

And I'll try to add something to the wiki, as well

chregu

On 08.05.11 12:36, Christian Stocker wrote:
> 
> On 08.05.11 12:28, Michael Wechner wrote:
>> Hi
>>
>> Thanks very much for this summary.
>>
>> Maybe you can add it to the Wiki
>>
>> http://wiki.apache.org/jackrabbit/Clustering
> 
> I'm currently writing a blogpost with more info, after that I'll add
> some info to the wiki.
> 
> chregu
> 
> 
>>
>> ;-)
>>
>> Thanks
>>
>> Michael
>>
>> On 5/3/11 9:18 AM, Christian Stocker wrote:
>>> Hi
>>>
>>> Just a little update on that, the following procedure seem to work
>>>
>>> - shutdown instance
>>> - get number from JOURNAL_LOCAL_REVISIONS
>>> - cp -r jackrabbit jackrabbit.bkup
>>> - mv jackrabbit.bkup to another server
>>> - change repository with new nodename in clusterconfig
>>> - add that to JOURNAL_LOCAL_REVISIONS with the number from above
>>> - done ;)
>>>
>>>
>>> I'll try to make a little script for that and document it
>>>
>>> THanks for all the input
>>>
>>> christian
>>>
>>> On 19.04.11 14:24, Christian Stocker wrote:
>>>> Hi all
>>>>
>>>> thanks a lot for your answers, looks good to me and we will proceed ;)
>>>>
>>>> christian
>>>>
>>>> On 19.04.11 13:46, Jeroen Reijn wrote:
>>>>>>>
>>>>>>> On 18.04.11 11:04, Jeroen Reijn wrote:
>>>>>>>>> I'm also not 100% sure, but I can second Alex his answer.
>>>>>>>>>> From what I've seen the new cluster node will start from the
>>>>>>>>> persisted
>>>>>>> data
>>>>>>>>> and will continue from there on with using the journal.
>>>>>>> The question is then, how does the new cluster node know on which
>>>>>>> position he should start? Or do we "just" have to make sure, that
>>>>>>> nothing writes between the start of the new node and "when it's
>>>>>>> ready".
>>>>>>> How do we know it's ready then?
>>>>>>>
>>>>> If you're using a database I know that there is a table in which
>>>>> Jackrabbit
>>>>> stores the global revision, which is the latest revision. All the
>>>>> nodes in
>>>>> the cluster will work towards that revision based on the repository
>>>>> journal.
>>>>> My best bet would be that when a new node in the cluster starts, it
>>>>> starts
>>>>> from this global revision.
>>>>>
>>>>> Next to the global_revision table, there is also a local_revision
>>>>> table that
>>>>> contains the current revision for each node in the cluster.
>>>>>
>>>>>
>>>>>
> 

-- 
Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE

Re: Add new nodes to a cluster

Posted by Christian Stocker <ch...@liip.ch>.

On 08.05.11 12:28, Michael Wechner wrote:
> Hi
> 
> Thanks very much for this summary.
> 
> Maybe you can add it to the Wiki
> 
> http://wiki.apache.org/jackrabbit/Clustering

I'm currently writing a blogpost with more info, after that I'll add
some info to the wiki.

chregu


> 
> ;-)
> 
> Thanks
> 
> Michael
> 
> On 5/3/11 9:18 AM, Christian Stocker wrote:
>> Hi
>>
>> Just a little update on that, the following procedure seem to work
>>
>> - shutdown instance
>> - get number from JOURNAL_LOCAL_REVISIONS
>> - cp -r jackrabbit jackrabbit.bkup
>> - mv jackrabbit.bkup to another server
>> - change repository with new nodename in clusterconfig
>> - add that to JOURNAL_LOCAL_REVISIONS with the number from above
>> - done ;)
>>
>>
>> I'll try to make a little script for that and document it
>>
>> THanks for all the input
>>
>> christian
>>
>> On 19.04.11 14:24, Christian Stocker wrote:
>>> Hi all
>>>
>>> thanks a lot for your answers, looks good to me and we will proceed ;)
>>>
>>> christian
>>>
>>> On 19.04.11 13:46, Jeroen Reijn wrote:
>>>>>>
>>>>>> On 18.04.11 11:04, Jeroen Reijn wrote:
>>>>>>>> I'm also not 100% sure, but I can second Alex his answer.
>>>>>>>> > From what I've seen the new cluster node will start from the
>>>>>>>> persisted
>>>>>> data
>>>>>>>> and will continue from there on with using the journal.
>>>>>> The question is then, how does the new cluster node know on which
>>>>>> position he should start? Or do we "just" have to make sure, that
>>>>>> nothing writes between the start of the new node and "when it's
>>>>>> ready".
>>>>>> How do we know it's ready then?
>>>>>>
>>>> If you're using a database I know that there is a table in which
>>>> Jackrabbit
>>>> stores the global revision, which is the latest revision. All the
>>>> nodes in
>>>> the cluster will work towards that revision based on the repository
>>>> journal.
>>>> My best bet would be that when a new node in the cluster starts, it
>>>> starts
>>>> from this global revision.
>>>>
>>>> Next to the global_revision table, there is also a local_revision
>>>> table that
>>>> contains the current revision for each node in the cluster.
>>>>
>>>>
>>>>

-- 
Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE

Re: Add new nodes to a cluster

Posted by Michael Wechner <mi...@wyona.com>.

Hi

Thanks very much for this summary.

Maybe you can add it to the Wiki

http://wiki.apache.org/jackrabbit/Clustering

;-)

Thanks

Michael

On 5/3/11 9:18 AM, Christian Stocker wrote:
> Hi
>
> Just a little update on that, the following procedure seem to work
>
> - shutdown instance
> - get number from JOURNAL_LOCAL_REVISIONS
> - cp -r jackrabbit jackrabbit.bkup
> - mv jackrabbit.bkup to another server
> - change repository with new nodename in clusterconfig
> - add that to JOURNAL_LOCAL_REVISIONS with the number from above
> - done ;)
>
>
> I'll try to make a little script for that and document it
>
> THanks for all the input
>
> christian
>
> On 19.04.11 14:24, Christian Stocker wrote:
>> Hi all
>>
>> thanks a lot for your answers, looks good to me and we will proceed ;)
>>
>> christian
>>
>> On 19.04.11 13:46, Jeroen Reijn wrote:
>>>>>
>>>>> On 18.04.11 11:04, Jeroen Reijn wrote:
>>>>>>> I'm also not 100% sure, but I can second Alex his answer.
>>>>>>> > From what I've seen the new cluster node will start from the persisted
>>>>> data
>>>>>>> and will continue from there on with using the journal.
>>>>> The question is then, how does the new cluster node know on which
>>>>> position he should start? Or do we "just" have to make sure, that
>>>>> nothing writes between the start of the new node and "when it's ready".
>>>>> How do we know it's ready then?
>>>>>
>>> If you're using a database I know that there is a table in which Jackrabbit
>>> stores the global revision, which is the latest revision. All the nodes in
>>> the cluster will work towards that revision based on the repository journal.
>>> My best bet would be that when a new node in the cluster starts, it starts
>>> from this global revision.
>>>
>>> Next to the global_revision table, there is also a local_revision table that
>>> contains the current revision for each node in the cluster.
>>>
>>>
>>>

Re: Add new nodes to a cluster

Posted by Christian Stocker <ch...@liip.ch>.

Hi

Just a little update on that, the following procedure seem to work

- shutdown instance
- get number from JOURNAL_LOCAL_REVISIONS
- cp -r jackrabbit jackrabbit.bkup
- mv jackrabbit.bkup to another server
- change repository with new nodename in clusterconfig
- add that to JOURNAL_LOCAL_REVISIONS with the number from above
- done ;)


I'll try to make a little script for that and document it

THanks for all the input

christian

On 19.04.11 14:24, Christian Stocker wrote:
> Hi all
> 
> thanks a lot for your answers, looks good to me and we will proceed ;)
> 
> christian
> 
> On 19.04.11 13:46, Jeroen Reijn wrote:
>>>>
>>>>
>>>> On 18.04.11 11:04, Jeroen Reijn wrote:
>>>>>> I'm also not 100% sure, but I can second Alex his answer.
>>>>>> >From what I've seen the new cluster node will start from the persisted
>>>> data
>>>>>> and will continue from there on with using the journal.
>>>>
>>>> The question is then, how does the new cluster node know on which
>>>> position he should start? Or do we "just" have to make sure, that
>>>> nothing writes between the start of the new node and "when it's ready".
>>>> How do we know it's ready then?
>>>>
>> If you're using a database I know that there is a table in which Jackrabbit
>> stores the global revision, which is the latest revision. All the nodes in
>> the cluster will work towards that revision based on the repository journal.
>> My best bet would be that when a new node in the cluster starts, it starts
>> from this global revision.
>>
>> Next to the global_revision table, there is also a local_revision table that
>> contains the current revision for each node in the cluster.
>>
>>
>>
> 

-- 
Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE

Re: Add new nodes to a cluster

Posted by Christian Stocker <ch...@liip.ch>.

Hi all

thanks a lot for your answers, looks good to me and we will proceed ;)

christian

On 19.04.11 13:46, Jeroen Reijn wrote:
>> >
>> >
>> > On 18.04.11 11:04, Jeroen Reijn wrote:
>>> > > I'm also not 100% sure, but I can second Alex his answer.
>>> > >>From what I've seen the new cluster node will start from the persisted
>> > data
>>> > > and will continue from there on with using the journal.
>> >
>> > The question is then, how does the new cluster node know on which
>> > position he should start? Or do we "just" have to make sure, that
>> > nothing writes between the start of the new node and "when it's ready".
>> > How do we know it's ready then?
>> >
> If you're using a database I know that there is a table in which Jackrabbit
> stores the global revision, which is the latest revision. All the nodes in
> the cluster will work towards that revision based on the repository journal.
> My best bet would be that when a new node in the cluster starts, it starts
> from this global revision.
> 
> Next to the global_revision table, there is also a local_revision table that
> contains the current revision for each node in the cluster.
> 
> 
> 

-- 
Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE

Re: Add new nodes to a cluster

Posted by Jeroen Reijn <j....@onehippo.com>.

On Mon, Apr 18, 2011 at 3:18 PM, Christian Stocker <
christian.stocker@liip.ch> wrote:

>
>
> On 18.04.11 11:04, Jeroen Reijn wrote:
> > I'm also not 100% sure, but I can second Alex his answer.
> >>From what I've seen the new cluster node will start from the persisted
> data
> > and will continue from there on with using the journal.
>
> The question is then, how does the new cluster node know on which
> position he should start? Or do we "just" have to make sure, that
> nothing writes between the start of the new node and "when it's ready".
> How do we know it's ready then?
>

If you're using a database I know that there is a table in which Jackrabbit
stores the global revision, which is the latest revision. All the nodes in
the cluster will work towards that revision based on the repository journal.
My best bet would be that when a new node in the cluster starts, it starts
from this global revision.

Next to the global_revision table, there is also a local_revision table that
contains the current revision for each node in the cluster.



>
> But thanks for the input so far
>
> christian
>
> >
> > Jeroen
> >
> > On Mon, Apr 18, 2011 at 10:52 AM, Michael Wechner <
> michael.wechner@wyona.com
> >> wrote:
> >
> >> Hi
> >>
> >> I am not sure if you received an answer on this already, but it seems
> >> somebody asked the
> >> same some time ago
> >>
> >> http://web.archiveorange.com/archive/v/L1HBQddB3PYM2Fsde3ZL
> >>
> >> (with one not totally certain answer).
> >>
> >> On the other you might just give it a try and check what it will do.
> >>
> >> I would also be interested in your findings.
> >>
> >> Thanks
> >>
> >> Michael
> >>
> >>
> >> On 4/14/11 11:20 AM, Christian Stocker wrote:
> >>
> >>> Hi
> >>>
> >>> We're currently building a jackrabbit setup with quite some new content
> >>> every day. We also want to use clustering and beeing able to add new
> >>> instances should the need arise. Reading
> >>> http://wiki.apache.org/jackrabbit/Clustering that seems to be not that
> >>> easy (adding a completely new instance) without having to replay the
> >>> whole journal from the beginning (which can take ages, of course).
> >>>
> >>> Is there an easy, recommended way to add new instances without having
> to
> >>> replay the whole journal? In the "Janitor" section, there's the remark
> >>>
> >>> "If the janitor is enabled then you loose the possibility to easily add
> >>> cluster nodes. (It is still possible but takes detailed knowledge of
> >>> Jackrabbit.)"
> >>>
> >>>
> >>> Is this still the case? What "Detailed knowledge" do we need, to just
> >>> "clone" a running instance and add that to the cluster?
> >>>
> >>> (using the janitor would certainly make sense in our use case)
> >>>
> >>> Any hint is very appreciated
> >>>
> >>> christian
> >>>
> >>>
> >>
> >
>
> --
> Liip AG  //  Feldstrasse 133 //  CH-8004 Zurich
> Tel +41 43 500 39 81 // Mobile +41 76 561 88 60
> www.liip.ch // blog.liip.ch // GnuPG 0x0748D5FE
>
>

Re: Add new nodes to a cluster

Posted by Jeroen Reijn <j....@onehippo.com>.

I'm also not 100% sure, but I can second Alex his answer.
>From what I've seen the new cluster node will start from the persisted data
and will continue from there on with using the journal.

Jeroen

On Mon, Apr 18, 2011 at 10:52 AM, Michael Wechner <michael.wechner@wyona.com
> wrote:

> Hi
>
> I am not sure if you received an answer on this already, but it seems
> somebody asked the
> same some time ago
>
> http://web.archiveorange.com/archive/v/L1HBQddB3PYM2Fsde3ZL
>
> (with one not totally certain answer).
>
> On the other you might just give it a try and check what it will do.
>
> I would also be interested in your findings.
>
> Thanks
>
> Michael
>
>
> On 4/14/11 11:20 AM, Christian Stocker wrote:
>
>> Hi
>>
>> We're currently building a jackrabbit setup with quite some new content
>> every day. We also want to use clustering and beeing able to add new
>> instances should the need arise. Reading
>> http://wiki.apache.org/jackrabbit/Clustering that seems to be not that
>> easy (adding a completely new instance) without having to replay the
>> whole journal from the beginning (which can take ages, of course).
>>
>> Is there an easy, recommended way to add new instances without having to
>> replay the whole journal? In the "Janitor" section, there's the remark
>>
>> "If the janitor is enabled then you loose the possibility to easily add
>> cluster nodes. (It is still possible but takes detailed knowledge of
>> Jackrabbit.)"
>>
>>
>> Is this still the case? What "Detailed knowledge" do we need, to just
>> "clone" a running instance and add that to the cluster?
>>
>> (using the janitor would certainly make sense in our use case)
>>
>> Any hint is very appreciated
>>
>> christian
>>
>>
>

Re: Add new nodes to a cluster

Posted by Michael Wechner <mi...@wyona.com>.

Hi

I am not sure if you received an answer on this already, but it seems 
somebody asked the
same some time ago

http://web.archiveorange.com/archive/v/L1HBQddB3PYM2Fsde3ZL

(with one not totally certain answer).

On the other you might just give it a try and check what it will do.

I would also be interested in your findings.

Thanks

Michael

On 4/14/11 11:20 AM, Christian Stocker wrote:
> Hi
>
> We're currently building a jackrabbit setup with quite some new content
> every day. We also want to use clustering and beeing able to add new
> instances should the need arise. Reading
> http://wiki.apache.org/jackrabbit/Clustering that seems to be not that
> easy (adding a completely new instance) without having to replay the
> whole journal from the beginning (which can take ages, of course).
>
> Is there an easy, recommended way to add new instances without having to
> replay the whole journal? In the "Janitor" section, there's the remark
>
> "If the janitor is enabled then you loose the possibility to easily add
> cluster nodes. (It is still possible but takes detailed knowledge of
> Jackrabbit.)"
>
>
> Is this still the case? What "Detailed knowledge" do we need, to just
> "clone" a running instance and add that to the cluster?
>
> (using the janitor would certainly make sense in our use case)
>
> Any hint is very appreciated
>
> christian
>