You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Premkumar Stephen <pr...@gmail.com> on 2010/02/26 13:02:48 UTC

Storing a jackrabbit repository as versioned files in git

Hello folks,

This is my first post here.

We are using the jackrabbit repository (which comes pre-built into a
Business Rules solution - http://www.jboss.org/drools/drools-guvnor.html)

Now, as a standard, we use git to store these files.

However, because of the way jackrabbit stores them, it becomes impossible to
do operations such as diffs, merges, etc.

For those assets that are versioned, can they each be in their own file, so
that git can take care of changes to them?

Regards,
Prem

Re: Storing a jackrabbit repository as versioned files in git

Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Feb 26, 2010 at 19:18, Premkumar Stephen <pr...@gmail.com> wrote:
> Can there be a persistence manager that generally does not need to be
> optimized but could persist just the tree structure as shown on the API
> side?

This is difficult because of:
- you need to find a way to map properties to the file system (needs
special "hidden" files or resource forks)
- current persistence interface is a simple key-value store, with the
key being the UUID of the node and the value all data of a node
(properties, child node pointers, parent pointer, metadata); the PM
does not know about the JCR level hierarchy
- ensuring consistency/transactionality is difficult w/o a log, index
and thus often a custom file format (*)

(*) the XmlPersistenceManager is probably the simplest, storing nodes
as XML files, but not based on the original hierachy, but splitting up
the UUID to generate the file system structure. And it's not safe
whatsoever, hence not recommended at all.

Note that Jackrabbit is much more than just a document (file-oriented)
storage and allows for much finer granularity (good for eg. CMS) with
many small properties. A normal file system is not very much suited
for that.

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: Storing a jackrabbit repository as versioned files in git

Posted by Premkumar Stephen <pr...@gmail.com>.
Hi Janne,

Thanks for letting me know about priha.
It would be great if I could use git or any other VCS to store this data
just like any other source files ( In Drools Guvnor, "code" is stored in the
repository).

I do not plan to do any file modifications using git.
Primarily, I am looking at storing and maintaining versions using git ( or
maybe another VCS).
I would like to do a diff, and it would help if one asset is in one file (
atleast those assets that are versioned).


Regards,
Prem


On Sun, Feb 28, 2010 at 4:40 AM, Janne Jalkanen <Ja...@ecyrd.com>wrote:

>
> The version control as such shouldn't really be a problem, as long as you
> don't actually modify the files (essentially using git as a backup system).
> But modifying the files needs rather deep understanding of what is going on,
> so I wouldn't recommend it.
>
> I'm not sure whether anyone has written a Jackrabbit PM which stores the
> files as-is.  I know that Priha (www.priha.org) has one.  But if you just
> do backup, then the actual physical structure shouldn't matter anyway.
>
> /Janne
>
>
> On Feb 26, 2010, at 20:18 , Premkumar Stephen wrote:
>
>  Hi Alex,
>>
>> Can there be a persistence manager that generally does not need to be
>> optimized but could persist just the tree structure as shown on the API
>> side?
>> I am not sure what this is being optimized on, but I could choose to not
>> optimize on space, etc.
>> In that case, would reads get slower?
>>
>> I agree, there should not be external entities changing the files while it
>> is running.
>> We shut down our app ( which internally shuts down jackrabbit) and then do
>> our version control.
>>
>> Could you help me get started with the implementation of such a
>> persistence,
>> or is this too deep and complicated for a newbie?
>>
>> Regards,
>> Prem
>>
>>
>> On Fri, Feb 26, 2010 at 12:10 PM, Alexander Klimetschek <aklimets@day.com
>> >wrote:
>>
>>  On Fri, Feb 26, 2010 at 13:02, Premkumar Stephen <pr...@gmail.com>
>>> wrote:
>>>
>>>> We are using the jackrabbit repository (which comes pre-built into a
>>>> Business Rules solution -
>>>> http://www.jboss.org/drools/drools-guvnor.html
>>>>
>>> )
>>>
>>>>
>>>> Now, as a standard, we use git to store these files.
>>>>
>>>
>>> Which files? The files beneath the repository home directory? What
>>> persistence manager do you use?
>>>
>>>  However, because of the way jackrabbit stores them, it becomes
>>>> impossible
>>>>
>>> to
>>>
>>>> do operations such as diffs, merges, etc.
>>>>
>>>> For those assets that are versioned, can they each be in their own file,
>>>>
>>> so
>>>
>>>> that git can take care of changes to them?
>>>>
>>>
>>> IIUC and you want to store the repository files (whatever persistence
>>> manager and data store you are using) so that you can version them,
>>> this is generally not possible. Jackrabbit's persistence architecture
>>> is optimized and doesn't necessarily represent the tree structure seen
>>> on the JCR API side. Also, it is not advisable to change those files
>>> while Jackrabbit is running.
>>>
>>> However, JCR has versioning built-in, so you can use that for
>>> versioning purposes. Note that actual diff/merge tools are not
>>> included in the JCR API (since it's too app and file format specific).
>>>
>>> Regards,
>>> Alex
>>>
>>> --
>>> Alexander Klimetschek
>>> alexander.klimetschek@day.com
>>>
>>>
>

Re: Storing a jackrabbit repository as versioned files in git

Posted by Janne Jalkanen <Ja...@ecyrd.com>.
The version control as such shouldn't really be a problem, as long as  
you don't actually modify the files (essentially using git as a backup  
system). But modifying the files needs rather deep understanding of  
what is going on, so I wouldn't recommend it.

I'm not sure whether anyone has written a Jackrabbit PM which stores  
the files as-is.  I know that Priha (www.priha.org) has one.  But if  
you just do backup, then the actual physical structure shouldn't  
matter anyway.

/Janne

On Feb 26, 2010, at 20:18 , Premkumar Stephen wrote:

> Hi Alex,
>
> Can there be a persistence manager that generally does not need to be
> optimized but could persist just the tree structure as shown on the  
> API
> side?
> I am not sure what this is being optimized on, but I could choose to  
> not
> optimize on space, etc.
> In that case, would reads get slower?
>
> I agree, there should not be external entities changing the files  
> while it
> is running.
> We shut down our app ( which internally shuts down jackrabbit) and  
> then do
> our version control.
>
> Could you help me get started with the implementation of such a  
> persistence,
> or is this too deep and complicated for a newbie?
>
> Regards,
> Prem
>
>
> On Fri, Feb 26, 2010 at 12:10 PM, Alexander Klimetschek <aklimets@day.com 
> >wrote:
>
>> On Fri, Feb 26, 2010 at 13:02, Premkumar Stephen <pr...@gmail.com>  
>> wrote:
>>> We are using the jackrabbit repository (which comes pre-built into a
>>> Business Rules solution - http://www.jboss.org/drools/drools-guvnor.html
>> )
>>>
>>> Now, as a standard, we use git to store these files.
>>
>> Which files? The files beneath the repository home directory? What
>> persistence manager do you use?
>>
>>> However, because of the way jackrabbit stores them, it becomes  
>>> impossible
>> to
>>> do operations such as diffs, merges, etc.
>>>
>>> For those assets that are versioned, can they each be in their own  
>>> file,
>> so
>>> that git can take care of changes to them?
>>
>> IIUC and you want to store the repository files (whatever persistence
>> manager and data store you are using) so that you can version them,
>> this is generally not possible. Jackrabbit's persistence architecture
>> is optimized and doesn't necessarily represent the tree structure  
>> seen
>> on the JCR API side. Also, it is not advisable to change those files
>> while Jackrabbit is running.
>>
>> However, JCR has versioning built-in, so you can use that for
>> versioning purposes. Note that actual diff/merge tools are not
>> included in the JCR API (since it's too app and file format  
>> specific).
>>
>> Regards,
>> Alex
>>
>> --
>> Alexander Klimetschek
>> alexander.klimetschek@day.com
>>


Re: Storing a jackrabbit repository as versioned files in git

Posted by Premkumar Stephen <pr...@gmail.com>.
Hi Alex,

Can there be a persistence manager that generally does not need to be
optimized but could persist just the tree structure as shown on the API
side?
I am not sure what this is being optimized on, but I could choose to not
optimize on space, etc.
In that case, would reads get slower?

I agree, there should not be external entities changing the files while it
is running.
We shut down our app ( which internally shuts down jackrabbit) and then do
our version control.

Could you help me get started with the implementation of such a persistence,
or is this too deep and complicated for a newbie?

Regards,
Prem


On Fri, Feb 26, 2010 at 12:10 PM, Alexander Klimetschek <ak...@day.com>wrote:

> On Fri, Feb 26, 2010 at 13:02, Premkumar Stephen <pr...@gmail.com> wrote:
> > We are using the jackrabbit repository (which comes pre-built into a
> > Business Rules solution - http://www.jboss.org/drools/drools-guvnor.html
> )
> >
> > Now, as a standard, we use git to store these files.
>
> Which files? The files beneath the repository home directory? What
> persistence manager do you use?
>
> > However, because of the way jackrabbit stores them, it becomes impossible
> to
> > do operations such as diffs, merges, etc.
> >
> > For those assets that are versioned, can they each be in their own file,
> so
> > that git can take care of changes to them?
>
> IIUC and you want to store the repository files (whatever persistence
> manager and data store you are using) so that you can version them,
> this is generally not possible. Jackrabbit's persistence architecture
> is optimized and doesn't necessarily represent the tree structure seen
> on the JCR API side. Also, it is not advisable to change those files
> while Jackrabbit is running.
>
> However, JCR has versioning built-in, so you can use that for
> versioning purposes. Note that actual diff/merge tools are not
> included in the JCR API (since it's too app and file format specific).
>
> Regards,
> Alex
>
> --
> Alexander Klimetschek
> alexander.klimetschek@day.com
>

Re: Storing a jackrabbit repository as versioned files in git

Posted by Alexander Klimetschek <ak...@day.com>.
On Fri, Feb 26, 2010 at 13:02, Premkumar Stephen <pr...@gmail.com> wrote:
> We are using the jackrabbit repository (which comes pre-built into a
> Business Rules solution - http://www.jboss.org/drools/drools-guvnor.html)
>
> Now, as a standard, we use git to store these files.

Which files? The files beneath the repository home directory? What
persistence manager do you use?

> However, because of the way jackrabbit stores them, it becomes impossible to
> do operations such as diffs, merges, etc.
>
> For those assets that are versioned, can they each be in their own file, so
> that git can take care of changes to them?

IIUC and you want to store the repository files (whatever persistence
manager and data store you are using) so that you can version them,
this is generally not possible. Jackrabbit's persistence architecture
is optimized and doesn't necessarily represent the tree structure seen
on the JCR API side. Also, it is not advisable to change those files
while Jackrabbit is running.

However, JCR has versioning built-in, so you can use that for
versioning purposes. Note that actual diff/merge tools are not
included in the JCR API (since it's too app and file format specific).

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com