You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by JOSE FELIX HERNANDEZ BARRIO <jo...@isthari.com> on 2010/01/24 12:28:28 UTC

change proposal DataStore

our goal it's to be able to create most complex policies for the datastore
component, for example:
- Assign a datastore per workspace (customer) so it's possible to measure
(and limit) storage usage for a given customer
- Dynamic allocation, so newer or more accessed nodes will be stored on
faster disk and old nodes will be mover to sata slower disk

what is the group opinion ?


To be able to carry out this task, we purpose the following changes :
- modify workspace dtd to allow inclusion of datastore configuration
- modify configuration reader/factory for workspace
- modify datastore interface to pass more information needed by the new
characteristics : node info, workspace info ...



-- 
Jose Hernandez
675599600
Isthari
http://www.isthari.com

Re: change proposal DataStore

Posted by Alexander Klimetschek <ak...@day.com>.
On Mon, Jan 25, 2010 at 11:04, JOSE FELIX HERNANDEZ BARRIO
<jo...@isthari.com> wrote:
> i think that if we extend the datastore interface, especially the function :
> DataRecord addRecord(InputStream stream)
> To add more information, for example: workspace name, node name, property name ...
> it would be possible to create more complex implementation.

That would be against a core principle of the data store. It is
independent from the JCR structure and a record is always only added
once, ie. if there are multiple usages of the same binary (= file) in
the JCR, it will be stored only once in the Datastore. The addRecord
method will only be called for the first time then.

Regards,
Alex

--
Alexander Klimetschek
alexander.klimetschek@day.com

Re: change proposal DataStore

Posted by Thomas Müller <th...@day.com>.
Hi,

> extend the datastore interface
> workspace name, node name, property name ...

I'm not sure if the workspace / node name / node identifier / property name
is always available.

One advantage of this addition would be: it could speed up garbage
collection. If a binary object "knows" the node identifier(s), garbage
collection could check the large objects first (because you could keep links
from the binary object to the place where it is / was used). I'm not
completely against such a change, however for the given problem (accounting)
it does sound like the wrong solution.

Regards,
Thomas

Re: change proposal DataStore

Posted by JOSE FELIX HERNANDEZ BARRIO <jo...@isthari.com>.
i think that if we extend the datastore interface, especially the function :
DataRecord<http://jackrabbit.apache.org/api/1.6/org/apache/jackrabbit/core/data/DataRecord.html>
*addRecord<http://jackrabbit.apache.org/api/1.6/org/apache/jackrabbit/core/data/DataStore.html#addRecord%28java.io.InputStream%29>
*(InputStream<http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStream.html?is-external=true>
 stream)
To add more information, for example: workspace name, node name, property
name ...
it would be possible to create more complex implementation.

2010/1/25 Thomas Müller <th...@day.com>

> Hi,
>
> Currently there is only data store per repository. If you need a data
> store per workspace, then you need one repository per workspace.
>
> > - Assign a datastore per workspace (customer) so it's possible to measure
> > (and limit) storage usage for a given customer
>
> This more sounds like an "accounting problem" than a "technical
> problem". Could you add some accounting code to the application? For
> example, use an ObservationListener to calculate the disk space used
> by a user (workspace). Or, use a wrapper around the input stream and
> measure / limit storage like this.
>

i will try observationListener, sounds interesting



>
> > - Dynamic allocation, so newer or more accessed nodes will be stored on
> > faster disk and old nodes will be mover to sata slower disk
>
> Such a 'caching data store' would be nice. It's a bit tricky to
> implement I think. Currently, there is no such implementation, however
> patches are always welcome.
>

I'm about to start working in this implementation, i will open a new thread
in the list as soon as i have some interesting result.




>
> Regards,
> Thomas
>



-- 
Jose Hernandez
675599600
Isthari
http://www.isthari.com

Re: change proposal DataStore

Posted by Thomas Müller <th...@day.com>.
Hi,

Currently there is only data store per repository. If you need a data
store per workspace, then you need one repository per workspace.

> - Assign a datastore per workspace (customer) so it's possible to measure
> (and limit) storage usage for a given customer

This more sounds like an "accounting problem" than a "technical
problem". Could you add some accounting code to the application? For
example, use an ObservationListener to calculate the disk space used
by a user (workspace). Or, use a wrapper around the input stream and
measure / limit storage like this.

> - Dynamic allocation, so newer or more accessed nodes will be stored on
> faster disk and old nodes will be mover to sata slower disk

Such a 'caching data store' would be nice. It's a bit tricky to
implement I think. Currently, there is no such implementation, however
patches are always welcome.

Regards,
Thomas

Re: change proposal DataStore

Posted by JOSE FELIX HERNANDEZ BARRIO <jo...@isthari.com>.
Are you suggesting to use a different repository for each customer?
would the administrative work be very high for such a scenario ?




2010/1/24 Guo Du <mr...@gmail.com>

> On Sun, Jan 24, 2010 at 11:28 AM, JOSE FELIX HERNANDEZ BARRIO
> <jo...@isthari.com> wrote:
> > - Assign a datastore per workspace (customer) so it's possible to measure
> > (and limit) storage usage for a given customer
> You may looking for repository as multi tenant solution instead of
> workspace:
>
> http://wiki.apache.org/jackrabbit/DavidsModel#Rule_.233:_Workspaces_are_for_clone.28.29.2C_merge.28.29_and_update.28.29
> .
>
> Different workspace share version history inside same repository which
> isn't what you want.
>
> Repository are independent form each other it will be able to meet
> your all requirements.
>
> -Guo
>



-- 
Jose Hernandez
675599600
Isthari
http://www.isthari.com

Re: change proposal DataStore

Posted by Guo Du <mr...@gmail.com>.
On Sun, Jan 24, 2010 at 11:28 AM, JOSE FELIX HERNANDEZ BARRIO
<jo...@isthari.com> wrote:
> - Assign a datastore per workspace (customer) so it's possible to measure
> (and limit) storage usage for a given customer
You may looking for repository as multi tenant solution instead of workspace:
http://wiki.apache.org/jackrabbit/DavidsModel#Rule_.233:_Workspaces_are_for_clone.28.29.2C_merge.28.29_and_update.28.29.

Different workspace share version history inside same repository which
isn't what you want.

Repository are independent form each other it will be able to meet
your all requirements.

-Guo