You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by "Thomas Mueller (JIRA)" <ji...@apache.org> on 2007/07/02 17:02:04 UTC

[jira] Updated: (JCR-926) Global data store for binaries

     [ https://issues.apache.org/jira/browse/JCR-926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Mueller updated JCR-926:
-------------------------------

    Attachment: dataStore3.patch

With this patch, both the old style 'BLOBStore' and the new style 'DataStore' implementations co-exist. The BLOBStore is used by default. To use the DataStore, set the System Property org.jackrabbit.useDataStore to 'true' as in:

java -Dorg.jackrabbit.useDataStore=true ...

So this patch can be used to test the GlobalDataStore feature. All unit tests pass.

There are still a few things missing: There is no garbage collection yet. Almost each blob creates two new subdirectories (this works, but is a bit slower, and means lots of directories; can be avoided maybe).

The abstract class BLOBFileValue is now called BLOBValue (because, it is now not always a file). The old BLOBFileValue is now again named BLOBFileValue.

New class Base64ReaderInputStream for BufferedStringValue to avoid creating a file when converting long Base64 strings to BINARY data. Actually the higher performance is just a side effect; the main reason to implement this was becuase the old constructor is based on a file resource and can't be used with the DataStore.


> Global data store for binaries
> ------------------------------
>
>                 Key: JCR-926
>                 URL: https://issues.apache.org/jira/browse/JCR-926
>             Project: Jackrabbit
>          Issue Type: New Feature
>          Components: core
>            Reporter: Jukka Zitting
>         Attachments: dataStore.patch, DataStore.patch, DataStore2.patch, dataStore3.patch, internalValue.patch, ReadWhileSaveTest.patch
>
>
> There are three main problems with the way Jackrabbit currently handles large binary values:
> 1) Persisting a large binary value blocks access to the persistence layer for extended amounts of time (see JCR-314)
> 2) At least two copies of binary streams are made when saving them through the JCR API: one in the transient space, and one when persisting the value
> 3) Versioining and copy operations on nodes or subtrees that contain large binary values can quickly end up consuming excessive amounts of storage space.
> To solve these issues (and to get other nice benefits), I propose that we implement a global "data store" concept in the repository. A data store is an append-only set of binary values that uses short identifiers to identify and access the stored binary values. The data store would trivially fit the requirements of transient space and transaction handling due to the append-only nature. An explicit mark-and-sweep garbage collection process could be added to avoid concerns about storing garbage values.
> See the recent NGP value record discussion, especially [1], for more background on this idea.
> [1] http://mail-archives.apache.org/mod_mbox/jackrabbit-dev/200705.mbox/%3c510143ac0705120919k37d48dc1jc7474b23c9f02cbd@mail.gmail.com%3e

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.