You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-commits@jackrabbit.apache.org by am...@apache.org on 2016/12/15 09:01:17 UTC
svn commit: r1774390 -
/jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
Author: amitj
Date: Thu Dec 15 09:01:17 2016
New Revision: 1774390
URL: http://svn.apache.org/viewvc?rev=1774390&view=rev
Log:
OAK-301: Document Blob id tracker functionality
Modified:
jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
Modified: jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md
URL: http://svn.apache.org/viewvc/jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md?rev=1774390&r1=1774389&r2=1774390&view=diff
==============================================================================
--- jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md (original)
+++ jackrabbit/oak/trunk/oak-doc/src/site/markdown/plugins/blobstore.md Thu Dec 15 09:01:17 2016
@@ -147,7 +147,30 @@ The garbage collection can be triggered
* If the MBeans are registered in the MBeanServer then the following can also be used to trigger GC:
* `BlobGC#startBlobGC()` which takes in a `markOnly` boolean parameter to indicate mark only or complete gc
-
+<a name="blobid-tracker"></a>
+#### Caching of Blob ids locally (Oak 1.6.x)
+
+For the `FileDataStore` and `S3DataStore` the blob ids are cached locally on the disk when they are created which
+speeds up the 'Mark BlobStore' phase. The locally tracked ids are synchronized with the data store periodically to enable
+other cluster nodes or different repositories sharing the datastore to get a consolidated list of all blob ids. The
+interval of synchronization is defined by the OSGi configuration parameter `blobTrackSnapshotIntervalInSecs` for the
+configured NodeStore services.
+
+If 2 garbage collection cycles are executed within the `blobTrackSnapshotIntervalInSecs` then there may be warnings
+in the logs of some missing blob ids which is due to the fact that the deletions due to earlier gc has not been
+synchronized with the data store. It's ok to either ignore these warnings or to adjust the `blobTrackSnapshotIntervalInSecs`
+ parameter according to the schedule identified for running blob gc.
+
+When upgrading an existing system to take advantage of caching the existing blob ids have to be cached. One of the
+following should be executed.
+
+* Use `MarkSweepGarbageCollectot#collectGarbage(boolean markOnly, boolean forceBlobRetrieve)` with `true` for
+`forceBlobRetrieve` parameter to force retrieving blob ids from the datastore and cache locally also.
+* Execute Blob GC before the configured time duration of `blobTrackSnapshotIntervalInSecs`.
+* Execute [consistency check](#consistency-check) from the JMX BlobGCMbean before the configured time duration of
+`blobTrackSnapshotIntervalInSecs`.
+* Execute `datastorecheck` command offline using oak-run with the `--track` option as defined in [consistency check](#consistency-check).
+
#### Shared DataStore Blob Garbage Collection (Since 1.2.0)
##### Registration
@@ -293,6 +316,7 @@ the steps:
* Remove other files corresponding to the particular repositoryId e.g. `markedTimestamp-[repositoryId]` or
`references-[repositoryId]`.
+<a name="consistency-check"></a>
#### Consistency Check
The data store consistency check will report any data store binaries that are missing but are still referenced. The
consistency check can be triggered by: