You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jackrabbit.apache.org by Julian Sedding <js...@apache.org> on 2017/05/22 14:33:50 UTC

Compaction in Oak 1.2.x

Hi all

I have an Oak repository that is currently on 1.2.24, which has not
been compacted for too long. I am told that running offline compaction
does not complete within ~10h. The system is currently in production
and running fine, however, disk space is slowly running out (current
size 1.6TB, I expect at least 2/3 to be garbage).

An update to 1.4.x or higher currently not possible. I am trying to
find options to run a successful compaction on 1.2.24.

Option 1:
I have seen the "oak.compaction.eagerFlush=true" flag, which I assume
will help. Does anyone have experience with such a scenario?

Is the long running compaction running long due to the amount of data
or because it is running OOM and Java GC is constantly running? If the
latter, does setting "eagerFlush" accelerate the compaction process
significantly?

Also: does "eagerFlush" allow for partial compaction? I.e. if I run
compaction with "eagerFlush" for 4h and then abort, are the 4h lost or
can I start the process again later and it has less work to do?


Option 2:
As far as I understand the problem with compaction in 1.2.x are
in-memory Java references that prevent. Assuming the correctness of my
understanding I have the hypothesis that running online-compaction
directly after a restart of the system would yield better results than
running online-compaction on a long running system. Is this hypothesis
valid?

Also: are there data-corruption/data-loss issues with
online-compaction in 1.2.x?

Thank you for sharing any experiences and insights on these matters.

Regards
Julian