You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by am...@apache.org on 2016/12/15 09:01:17 UTC
svn commit: r1774390 - /jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md

Author: amitj
Date: Thu Dec 15 09:01:17 2016
New Revision: 1774390

URL: http://svn.apache.org/viewvc?rev=1774390&view=rev
Log:
OAK-301: Document Blob id tracker functionality

Modified:
    jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md

Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md?rev=1774390&r1=1774389&r2=1774390&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md Thu Dec 15 09:01:17 2016
@@ -147,7 +147,30 @@ The garbage collection can be triggered
 * If the MBeans are registered in the MBeanServer then the following can also be used to trigger GC:
     * `BlobGC#startBlobGC()` which takes in a `markOnly` boolean parameter to indicate mark only or complete gc
 
- 
+<a name="blobid-tracker"></a>  
+#### Caching of Blob ids locally (Oak 1.6.x)
+
+For the `FileDataStore` and `S3DataStore` the blob ids are cached locally on the disk when they are created which 
+speeds up the 'Mark BlobStore' phase. The locally tracked ids are synchronized with the data store periodically to enable 
+other cluster nodes or different repositories sharing the datastore to get a consolidated list of all blob ids. The 
+interval of synchronization is defined by the OSGi configuration parameter `blobTrackSnapshotIntervalInSecs` for the 
+configured NodeStore services.
+
+If 2 garbage collection cycles are executed within the `blobTrackSnapshotIntervalInSecs` then there may be warnings 
+in the logs of some missing blob ids which is due to the fact that the deletions due to earlier gc has not been 
+synchronized with the data store. It's ok to either ignore these warnings or to adjust the `blobTrackSnapshotIntervalInSecs` 
+ parameter according to the schedule identified for running blob gc.
+
+When upgrading an existing system to take advantage of caching the existing blob ids have to be cached. One of the 
+following should be executed.
+
+* Use `MarkSweepGarbageCollectot#collectGarbage(boolean markOnly, boolean forceBlobRetrieve)` with `true` for 
+`forceBlobRetrieve` parameter to force retrieving blob ids from the datastore and cache locally also.
+* Execute Blob GC before the configured time duration of `blobTrackSnapshotIntervalInSecs`.
+* Execute [consistency check](#consistency-check) from the JMX BlobGCMbean before the configured time duration of 
+`blobTrackSnapshotIntervalInSecs`.
+* Execute `datastorecheck` command offline using oak-run with the `--track` option as defined in [consistency check](#consistency-check).
+
 #### Shared DataStore Blob Garbage Collection (Since 1.2.0)
 
 ##### Registration
@@ -293,6 +316,7 @@ the steps:
 * Remove other files corresponding to the particular repositoryId e.g. `markedTimestamp-[repositoryId]` or 
 `references-[repositoryId]`.
 
+<a name="consistency-check"></a>  
 #### Consistency Check
 The data store consistency check will report any data store binaries that are missing but are still referenced. The 
 consistency check can be triggered by: