You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by md...@apache.org on 2019/03/28 10:30:24 UTC

svn commit: r1856466 - /jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segment/onrc-memoirs.md

Author: mduerig
Date: Thu Mar 28 10:30:24 2019
New Revision: 1856466

URL: http://svn.apache.org/viewvc?rev=1856466&view=rev
Log:
OAK-301: Document Oak
Memoirs in Garbage Collection:
- Improved cross referencing into GitHub
- Paragraph about how reachability is used for bulk segments in Oak 1.6 to determine reclaimability for those

Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segment/onrc-memoirs.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segment/onrc-memoirs.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segment/onrc-memoirs.md?rev=1856466&r1=1856465&r2=1856466&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segment/onrc-memoirs.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/nodestore/segment/onrc-memoirs.md Thu Mar 28 10:30:24 2019
@@ -57,9 +57,11 @@ Oak 1.6 was the first version with worka
 ### Generational garbage collection
 Oak 1.6 changed the mechanism used to determine reclaimability of segments. Previous versions used reachability through the segment graph starting from a set of GC roots consisting of the segment containing the current head node state and all segments containing records currently referenced by the JVM (i.e. by open sessions). 
 
-Oak 1.6 introduced the concept of a GC generation. GC generations are numbered starting at 0 and increasing with each run of OnRC. Each segment records the current GC generation from the time the segment was created in its segment header. The current GC generation of the repository is just the GC generation of the segment containing the current head state. The compactor reads the current GC generation of the repository and rewrites the head state using the next GC generation number for the segments created in the process. Once the compactor finished rewriting the current head state the newly created, compact head state is atomically set as the new head state of the repository, implicitly and atomically increasing the GC generation of the repository at the same time. 
+Oak 1.6 introduced the concept of a [GC generation](https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.0/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/Segment.java#L380). GC generations are numbered starting at 0 and increasing with each run of OnRC. Each segment records the current GC generation from the time the segment was created in its [segment header](https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.0/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentBufferWriter.java#L200-L203). The current GC generation of the repository is just the GC generation of the segment containing the current head state. The compactor [reads](https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.0/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/FileStore.java#L845) the current GC generation of the repository and [rewrites](https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.0/oak-segment-
 tar/src/main/java/org/apache/jackrabbit/oak/segment/file/FileStore.java#L864) the head state using the next GC generation number for the segments created in the process. Once the compactor finished rewriting the current head state the newly created, compact head state is [atomically set](https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.0/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/FileStore.java#L876) as the new head state of the repository, implicitly and atomically increasing the GC generation of the repository at the same time. 
 
-In its default configuration the cleanup phase retains all segments from the current GC generation and the previous one reclaiming all older segments. With the default daily OnRC execution this results in a minimal segment retention time of 24 hours. Sessions that are open at the point in time where OnRC runs will automatically [refresh](https://issues.apache.org/jira/browse/OAK-2407) at next access to reduce the risk for them to reference content from segments that were reclaimed.
+In its default configuration the [cleanup](https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.0/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/FileStore.java#L1055) phase retains all segments from the current GC generation and the previous one reclaiming all older segments. With the default daily OnRC execution this results in a minimal segment retention time of 24 hours. Sessions that are open at the point in time where OnRC runs will automatically [refresh](https://issues.apache.org/jira/browse/OAK-2407) at next access to reduce the risk for them to reference content from segments that were reclaimed.
+
+ Since [bulk segments](http://jackrabbit.apache.org/oak/docs/nodestore/segment/records.html#Bulk_segments) do not have a segment header and thus cannot record their GC generation, the cleanup phase still uses [reachability](https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.0/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/TarReader.java#L754) through the segment graph to determine whether a bulk segment is reclaimable. That is, a bulk segment is reclaimable if and only if it is not reachable through the segment graph of the non reclaimable data segments starting from an initial set of [root segment ids](https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.0/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/file/FileStore.java#L1076).
 
 ### Preventing references between segments with different GC generations
 The generation based garbage collection approach disallows references between segments from different GC generations as otherwise reclaiming an older generation would render a newer one incomplete potentially causing [`SegmentNotFoundException`](https://github.com/apache/jackrabbit-oak/blob/jackrabbit-oak-1.6.0/oak-segment-tar/src/main/java/org/apache/jackrabbit/oak/segment/SegmentNotFoundException.java#L27)s subsequently. Unfortunately up to Oak 1.4 references between segments of different GC generations could be introduced by sessions that were acquired before an OnRC cycle completed. Such sessions would reference records in segments of the previous GC generations through their base state. When such a session subsequently saves its changes are written to segments of the new GC generation effectively creating references from this GC generation to the previous one. See [(OAK-3348)](https://issues.apache.org/jira/browse/OAK-3348) for the full story.