You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by KÖLL Claus <C....@TIROL.GV.AT> on 2012/07/04 07:27:17 UTC

MultiDataStore ...

I'm implementing at the moment a MultiDataStore and want to know if
someone else is interested so i would provide a patch.

The background:
We are storing a huge amount of files into jackrabbit (at the moment about 2 TB).
We are using a DBDataStore running against a high available OracleCluster. The problem is 
that we must keep files up to 80 years for government requirements. So the costs will
increase every year for the backend. The plan is now to move files time based from one 
datastore to a other. The archive DataStore is mapped to a cheaper backend like a 
SATA RAID or a Tape Library. 

This will be done with a DataStoreJanitor. It would move files based on the modified date
to the other datastore in a background process.

The MultiDataStore is only a DataStore Wrapper with two DataStores in it. The append 
will work against the primary DataStore and the read will first look inside the primary and
if not found there it will use the archive DataStore. The GarabageCollector would remove 
only files from the archive DataStore.

The configuration could look like:
<MultiDataStore class="org.apache.jackrabbit.core.data.MultiDataStore" >
  <primary>
    <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">
    ...
    </DataStore>
  </primary>
  <archive>
    <DataStore class="org.apache.jackrabbit.core.data.FileDataStore"> 
    ...
    </DataStore>
  </archive>
</MultiDataStore>

greets
claus

Re: MultiDataStore ...

Posted by Luca Tagliani <l....@cbt.it>.
Hi Claus,
  I will be interested in your implementation.
I think that it would be very good for jackrabbit to have this capability.
I could see right now, referring to my customers, situation in which this
implementation could be very helpful.

BR

Luca

--
View this message in context: http://jackrabbit.510166.n4.nabble.com/MultiDataStore-tp4655772p4655816.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

AW: MultiDataStore ...

Posted by KÖLL Claus <C....@TIROL.GV.AT>.
Hi Jukka,

My initial patch will not handle your use case but i think we can extend it later
or implement a decorator datastore separately.

The configuration was open for me and i thought about the same approach as you described.
I think this configuration format will be the best way.

I will open now a Jira Issue for the first draft of a MultiDatastore.

greets
claus

Re: MultiDataStore ...

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Wed, Jul 4, 2012 at 7:27 AM, KÖLL Claus <C....@tirol.gv.at> wrote:
> I'm implementing at the moment a MultiDataStore and want to know if
> someone else is interested so i would provide a patch.

Sound useful. A somewhat similar case I've been thinking about is a
local datastore cache that could act as a decorator to another, remote
datastore backend. The MultiDataStore approach should be helpful in
implementing also something like that.

> The configuration could look like:
> <MultiDataStore class="org.apache.jackrabbit.core.data.MultiDataStore" >
>   <primary>
>     <DataStore class="org.apache.jackrabbit.core.data.db.DbDataStore">

This is the tricky bit, as our XML-based configuration mechanism isn't
very easy to adjust for such features.

Instead of extending the XML configuration format with yet another
custom structure, how about we allow the existing <param/> elements to
specify custom objects instead of just scalar values. That would turn
the above configuration to:

  <DataStore class="org.apache.jackrabbit.core.data.MultiDataStore" >
    <param name="primary"
class="org.apache.jackrabbit.core.data.db.DbDataStore">
      ...
    </param>
    <param name="archive" class="org.apache.jackrabbit.core.data.FileDataStore">
       ...
    </param>
  </DataStore>

Such a feature should come in handy also for many other custom
configuration needs.

BR,

Jukka Zitting