You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by liang cheng <lc...@gmail.com> on 2013/06/05 10:59:06 UTC

Re: about removing Old Revisions from journal table.

Could someone kindly give me some help? Thanks.

Regards,
-Liang


2013/5/29 liang cheng <lc...@gmail.com>

>  Hi, all
>    In our production environment, the Jackrabbit Journal table would
> become large (more than 100, 000 records) after running 2 weeks. As a
> result, we plan to utilize the janitor thread to remove old revisions
> mentioned in http://wiki.apache.org/jackrabbit/Clustering#Removing Old
> Revisions.
>   After enabling it, there would be several caveats as mentioned in the
> wiki page too.
>        1. If the janitor is enabled then you loose the possibility to
> easily add cluster nodes. (It is still possible but takes detailed
> knowledge of Jackrabbit.)
>        2. You must make sure that all cluster nodes have written their
> local revision to the database before the clean-up task runs for the first
> time because otherwise    cluster nodes might miss updates (because they
> have been purged) and their local caches and search-indexes get out of
> sync.
>       3. If a cluster node is removed permanently from the cluster, then
> its entry in the LOCAL_REVISIONS table should be removed manually.
> Otherwise, the clean-up thread will not be effective.
>
>   I can understand point #3.But not quite sure about #1 and #2.
>
>   #1 is our biggest concern. In our production environment,  we have cases
> to need add new cluster node(s), e.g. If system capacity could not handle
> current workload, or if some running node needs to be stopped for some
> while for maintenance and then new node needs to be added. In #1, you only
> say that "you loose the possibility to easily add cluster nodes", but
> doesn't give more explaination about the reason.  As I know, when new node
> is added into the JR cluster, there is no lucene index, then Jackrabbit
> would build the index for the whole current repository nodes (build from
> root node). After this step, Jackrabbit then process the revisions
> generated by other nodes. *I wonder what's the possible issue when
> processing old revisions with latest repository content in cache and
> indexes?
> *
>
>   For #2, *does it mean any manual work needed to keep the consistency?*
>
>
>
>   Although the wiki page give one approch to add new cluster node manually
> (i.e. clone indexes and local revision number from existing node), we still
> hope there is some safe  programming way to avoid the manual work, because
> our production is deployed in Amazon EC2 environment and adding new node
> needs easily as much as possible.
>
>   Could you please give some comments to my concerns? Thanks.
>
>
> Regards,
>
> -Liang
>
>
>
>
>
>
>
>