You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by JOSE FELIX HERNANDEZ BARRIO <jo...@isthari.com> on 2010/01/24 12:28:28 UTC
change proposal DataStore
our goal it's to be able to create most complex policies for the datastore
component, for example:
- Assign a datastore per workspace (customer) so it's possible to measure
(and limit) storage usage for a given customer
- Dynamic allocation, so newer or more accessed nodes will be stored on
faster disk and old nodes will be mover to sata slower disk
what is the group opinion ?
To be able to carry out this task, we purpose the following changes :
- modify workspace dtd to allow inclusion of datastore configuration
- modify configuration reader/factory for workspace
- modify datastore interface to pass more information needed by the new
characteristics : node info, workspace info ...
--
Jose Hernandez
675599600
Isthari
http://www.isthari.com
Re: change proposal DataStore
Posted by Alexander Klimetschek <ak...@day.com>.
On Mon, Jan 25, 2010 at 11:04, JOSE FELIX HERNANDEZ BARRIO
<jo...@isthari.com> wrote:
> i think that if we extend the datastore interface, especially the function :
> DataRecord addRecord(InputStream stream)
> To add more information, for example: workspace name, node name, property name ...
> it would be possible to create more complex implementation.
That would be against a core principle of the data store. It is
independent from the JCR structure and a record is always only added
once, ie. if there are multiple usages of the same binary (= file) in
the JCR, it will be stored only once in the Datastore. The addRecord
method will only be called for the first time then.
Regards,
Alex
--
Alexander Klimetschek
alexander.klimetschek@day.com
Re: change proposal DataStore
Posted by Thomas Müller <th...@day.com>.
Hi,
> extend the datastore interface
> workspace name, node name, property name ...
I'm not sure if the workspace / node name / node identifier / property name
is always available.
One advantage of this addition would be: it could speed up garbage
collection. If a binary object "knows" the node identifier(s), garbage
collection could check the large objects first (because you could keep links
from the binary object to the place where it is / was used). I'm not
completely against such a change, however for the given problem (accounting)
it does sound like the wrong solution.
Regards,
Thomas
Re: change proposal DataStore
Posted by JOSE FELIX HERNANDEZ BARRIO <jo...@isthari.com>.
i think that if we extend the datastore interface, especially the function :
DataRecord<http://jackrabbit.apache.org/api/1.6/org/apache/jackrabbit/core/data/DataRecord.html>
*addRecord<http://jackrabbit.apache.org/api/1.6/org/apache/jackrabbit/core/data/DataStore.html#addRecord%28java.io.InputStream%29>
*(InputStream<http://java.sun.com/j2se/1.4.2/docs/api/java/io/InputStream.html?is-external=true>
stream)
To add more information, for example: workspace name, node name, property
name ...
it would be possible to create more complex implementation.
2010/1/25 Thomas Müller <th...@day.com>
> Hi,
>
> Currently there is only data store per repository. If you need a data
> store per workspace, then you need one repository per workspace.
>
> > - Assign a datastore per workspace (customer) so it's possible to measure
> > (and limit) storage usage for a given customer
>
> This more sounds like an "accounting problem" than a "technical
> problem". Could you add some accounting code to the application? For
> example, use an ObservationListener to calculate the disk space used
> by a user (workspace). Or, use a wrapper around the input stream and
> measure / limit storage like this.
>
i will try observationListener, sounds interesting
>
> > - Dynamic allocation, so newer or more accessed nodes will be stored on
> > faster disk and old nodes will be mover to sata slower disk
>
> Such a 'caching data store' would be nice. It's a bit tricky to
> implement I think. Currently, there is no such implementation, however
> patches are always welcome.
>
I'm about to start working in this implementation, i will open a new thread
in the list as soon as i have some interesting result.
>
> Regards,
> Thomas
>
--
Jose Hernandez
675599600
Isthari
http://www.isthari.com
Re: change proposal DataStore
Posted by Thomas Müller <th...@day.com>.
Hi,
Currently there is only data store per repository. If you need a data
store per workspace, then you need one repository per workspace.
> - Assign a datastore per workspace (customer) so it's possible to measure
> (and limit) storage usage for a given customer
This more sounds like an "accounting problem" than a "technical
problem". Could you add some accounting code to the application? For
example, use an ObservationListener to calculate the disk space used
by a user (workspace). Or, use a wrapper around the input stream and
measure / limit storage like this.
> - Dynamic allocation, so newer or more accessed nodes will be stored on
> faster disk and old nodes will be mover to sata slower disk
Such a 'caching data store' would be nice. It's a bit tricky to
implement I think. Currently, there is no such implementation, however
patches are always welcome.
Regards,
Thomas
Re: change proposal DataStore
Posted by JOSE FELIX HERNANDEZ BARRIO <jo...@isthari.com>.
Are you suggesting to use a different repository for each customer?
would the administrative work be very high for such a scenario ?
2010/1/24 Guo Du <mr...@gmail.com>
> On Sun, Jan 24, 2010 at 11:28 AM, JOSE FELIX HERNANDEZ BARRIO
> <jo...@isthari.com> wrote:
> > - Assign a datastore per workspace (customer) so it's possible to measure
> > (and limit) storage usage for a given customer
> You may looking for repository as multi tenant solution instead of
> workspace:
>
> http://wiki.apache.org/jackrabbit/DavidsModel#Rule_.233:_Workspaces_are_for_clone.28.29.2C_merge.28.29_and_update.28.29
> .
>
> Different workspace share version history inside same repository which
> isn't what you want.
>
> Repository are independent form each other it will be able to meet
> your all requirements.
>
> -Guo
>
--
Jose Hernandez
675599600
Isthari
http://www.isthari.com
Re: change proposal DataStore
Posted by Guo Du <mr...@gmail.com>.
On Sun, Jan 24, 2010 at 11:28 AM, JOSE FELIX HERNANDEZ BARRIO
<jo...@isthari.com> wrote:
> - Assign a datastore per workspace (customer) so it's possible to measure
> (and limit) storage usage for a given customer
You may looking for repository as multi tenant solution instead of workspace:
http://wiki.apache.org/jackrabbit/DavidsModel#Rule_.233:_Workspaces_are_for_clone.28.29.2C_merge.28.29_and_update.28.29.
Different workspace share version history inside same repository which
isn't what you want.
Repository are independent form each other it will be able to meet
your all requirements.
-Guo