You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Patricio Echagüe <pa...@gmail.com> on 2009/12/18 21:51:45 UTC
Re: Clustering-Issues

Hi Christian, I have a question regarding your strategy. I'm sure you
mentioned it but just to clarify.

If you somehow start up a new JR instance initialized with the
GLOBAL_REVISION, the only way to keep the indexes up-to-date is by
initializing the new instance with the latest index snapshot, right?

On Wed, Oct 28, 2009 at 5:31 AM, Christian Wurbs <Christian.Wurbs@itemic.com
> wrote:

> Hi,
>
> I think the issue JCR-1087 is about the Janitor feature of DataBaseJournal
> class.
>
> # janitorEnabled: specifies whether the clean-up thread for the journal
> table is enabled (default = false)
> # janitorSleep: specifies the sleep time of the clean-up thread in seconds
> (only useful when the clean-up thread is enabled, default = 24 * 60 * 60,
> which equals 24 hours)
> # janitorFirstRunHourOfDay: specifies the hour at which the clean-up thread
> initiates its first run (default = 3 which means 3:00 at night)
>
> I already "used" it but it seemed not to work.
>
> The second caveat of the following comment
>
> https://issues.apache.org/jira/browse/JCR-1087?focusedCommentId=12569875&pag
>
> e=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_<https://issues.apache.org/jira/browse/JCR-1087?focusedCommentId=12569875&pag%0Ae=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_>
> 12569875 is the reason why the janitor seemed not to work in my case.
>
> I removed obsolete cluster node ids (which had a very small local revision
> number) so the janitor could now do its job. Thanks for the JCR-1087 hint.
>
> But still there is the question why a new cluster node initializes with
> Revision 0 instead of GLOBAL_REVISION?
> The code at DatabaseJournal.initInstanceRevisionAndJanitor looks like it
> was
> easier to implement that way.
> I think in "normal/production" cases this is sufficient, since new nodes
> only need to replay some journal entries - if the janitor works.
>
>
> Regarding the "manual" deletion of permanently/obsolete cluster nodes I'm
> going to implement some sanitation based on last local revision update time
> per node id.
>
> Thanks for your help.
>
>
> Christian Wurbs
>
> itemic AG
> Am Brauhaus 8a
> 01099 Dresden
> Christian.Wurbs@itemic.de
> Tel.: +49 (351) 26622-23
> Fax.: +49 (351) 26622-20
> Vorstand . Torsten Werneke . Aufsichtsratsvorsitzender . Walter Gunz . Sitz
> der Gesellschaft . Dresden . Handelsregister . Amtsgericht Dresden . HRB
> 19383
>
> DISCLAIMER
> Any opinions expressed in this e-mail are those of the individual and not
> necessarily the company. This e-mail and any files transmitted with it are
> confidential and solely for the use of the intended recipient. If you are
> not the attended recipient or the person responsible for delivering to the
> intended recipient, be advised that you have received this e-mail in error
> and that any use is strictly prohibited. If you have received this e-mail
> in
> error, please advise the sender immediately by using the reply facility in
> your e-mail software. We have taken every precaution to ensure that any
> attachments have been checked for viruses. However, we cannot except
> liability for any damage sustained as a result of software viruses and
> advise that you carry out your own virus checks before opening any
> attachments.
>
>
> See you at
>
> -----Ursprüngliche Nachricht-----
> Von: Ian Boston [mailto:ianboston@googlemail.com] Im Auftrag von Ian
> Boston
> Gesendet: Dienstag, 27. Oktober 2009 21:28
> An: users@jackrabbit.apache.org
> Betreff: Re: Clustering-Issues
>
>
> On 27 Oct 2009, at 16:38, Thomas Müller wrote:
>
> > Hi,
> >
> >> two cluster nodes working for a while.
> >> 100000 revisions in the datastore.
> >> add a third cluster node
> >> it's replaying 100000 journal entries
> >> Is there a way of having the third (new) cluster node start at the
> >> latest
> >> Global-Revision immediately?
> >
> > There seems to be a related feature:
> > https://issues.apache.org/jira/browse/JCR-1087 - I'm not sure if this
> > will solve the problem however (I don't really know this feature)
>
> We have been running in production with a similar solution to
> JCR-1087. We have a perl script that creates a consistent snapshot of
> the local disk state (through repetitive rsyncs) and stores that
> snapshot on a central server.
>
> When a new node comes up, it pulls the snapshot from the central
> server, adjusts some of the settings and starts the JVM up. At this
> point jackrabbit replays the part of the journal since the snapshot as
> taken.
>
> When the snapshots are stored, we look into the local revisions file,
> extract the revision and store it. A separate process then deletes
> journal records from the database prior to the earliest snapshot,
> hence keeping the size of the journal down, and the startup time down.
>
> The JVM's we use between 3 and 8 depending on the load at time are
> hosted on Xen based Linux virtual machines, and over the past 18
> months in production I believe we have recreated the JVM's many times
> with no problems (or at least not I've been told about).
>
> Although the approach is a little agricultural and the repetitive
> rsync can take a while to get a solid snapshot (and we do sometimes
> get a bad one when the indexes are half way though some optimization),
> it woks with JR 1.4, and we get at least 3 parallel snapshots of the
> local node state at any one time (infact we keep several old versions
> for each node). The nice part is the JR startup script always starts
> form a snapshot so the startup time is always acceptable.
>
> Looking at the comments on JCR-1087 it does some of the same things.
>
> Ian
>
> >
> >> If I temporarily shut down the second cluster node I receive the
> >> following
> >> error messages during synchronization at restarting this second node:
> >
> > I am not sure, it sounds like a bug... Could you create a Jira issue
> > for this, together with a simple reproducible test case?
> >
> > Regards,
> > Thomas
>
>
>


-- 
Patricio.-