You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@jackrabbit.apache.org by Apache Wiki <wi...@apache.org> on 2007/09/13 17:12:51 UTC
[Jackrabbit Wiki] Update of "DataStore" by ThomasMueller

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The following page has been changed by ThomasMueller:
http://wiki.apache.org/jackrabbit/DataStore

New page:
== How to configure the file data store ==

To use the File´Data´Store, add this to your repository.xml after the <Repository> start tag:

    <DataStore class="org.apache.jackrabbit.core.data.File``Data``Store"/> 

== Additional configuration options ==

This is a full configuration using the default values:

    <DataStore class="org.apache.jackrabbit.core.data.File``Data``Store">
        <param name="path" value="${rep.home}/repository/datastore"/>
        <param name="minRecordLength" value="100"/>
    </Data``Store>

== Clustering ==

Clustering is supported if you use a clustered file system. You need to set data store path of all cluster nodes to the same location.

== How does it work ==

When adding a binary object, Jackrabbit checks the size of it. When it is larger than minRecordLength, it is added to the data store, otherwise it is kept in-memory. This is done very early (possible when calling Property.setValue(stream)). Only the unique data identifier is stored in the persistence manager (except for in-memory objects, where the data is stored). When updating a value, the old value is kept there an the new value is added (there is no update operation).

The current implementation still stores temporary files in some situations, for example in the RMI client. Those cases will be changed to use the data store directly where it makes sense.

Very small objects (where it does not make sense to create a file) are kept in memory.

Objects in the data store are only removed when they are not reachable. There is no 'update' operation, only 'add new entry'. Data is added before the transaction is committed. Additions are globally atomic, cluster nodes can share the same data store. Even different repositories can share the same store, as long as garbage collection is done correctly. 

== Running data store garbage collection ==

Running the garbage collection is currently a manual process.

== How to write a new data store implementation ==

New implementations are welcome! Cool would be a S3 data store (http://en.wikipedia.org/wiki/Amazon_S3). Maybe somebody needs a database data store. A caching data store would be great as well (items that are used a lot are stored in fast file system, others in a slower one).