You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Stefan Kurla <st...@gmail.com> on 2007/04/27 17:00:05 UTC

Fwd: understanding jackrabbit datastorage

I guess this is more suited for the dev list. Also, I could figure
this out by tracing the code, but I would appreciate if someone who
has already worked on this could explain. Would help in understanding
the system a little bit better and save a little bit time.

How is the data actually stored in jackrabbit say using mysql for
example and we are just using the default workspace.

Say the structure is
/
--folderA:nt:folder (propertyX:references fileB)
----fileA:nt:file
--fileB:nt:file

There is the default_binval which has binval_id and binval_data.
### Is this table used to store binary data, where binval_id is the
uuid of the jcr:content that this is referring to and binval_data is
the actual bytestream blob data

There is default_node which has node_id and node_data.
###How is this used?

default_prop with prop_id and prop_data
###How is this used?

default_refs with node_id and refs_data
###How is this used?

My question then is how would the database store the uuids or nodes of
the structure that is defined above. Very simple structure but to
understand how this structure is actually translated to be stored in
the database would be helpful.

Thanks.

Re: understanding jackrabbit datastorage

Posted by Stefan Kurla <st...@gmail.com>.

Thanks it does help.


On 4/27/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 4/27/07, Stefan Kurla <st...@gmail.com> wrote:
> > I guess this is more suited for the dev list.
>
> Yep.
>
> > How is the data actually stored in jackrabbit say using mysql for
> > example and we are just using the default workspace.
>
> A good starting point in understanding the underlying storage model of
> Jackrabbit is to look at the PersistenceManager interface [1]. The
> actual physical storage model depends on the persistence manager
> implementation you are using, but the logical model is fixed by the
> interface.
>
> The PersistenceManager abstraction essentially treats all nodes and
> properties as individually addressable items that each have their own
> unique identifier. In addition to these items the interface also
> defines a mechanism to store and access all the references pointing to
> a node.
>
> > There is the default_binval which has binval_id and binval_data.
> > ### Is this table used to store binary data, where binval_id is the
> > uuid of the jcr:content that this is referring to and binval_data is
> > the actual bytestream blob data
>
> Yes, the binval table stores binary properties when the externalBLOBs
> configuration option is set to "false".
>
> The binval_id column contains the property identifier plus value index
> (because of multivalued properties) used to identify the binary value,
> and the binval_data column contains the actual byte stream.
>
> > There is default_node which has node_id and node_data.
> > ###How is this used?
>
> The node_id column contains the unique node identifier and the
> node_data column contains the node state in a serialized format [2].
>
> > default_prop with prop_id and prop_data
> > ###How is this used?
>
> The prop_id column contains the property identifier, and the prop_data
> column contains the property state in a serialized format [2].
>
> > default_refs with node_id and refs_data
> > ###How is this used?
>
> The node_id contains the identifier of the reference target node, and
> the refs_data contains the list of referencing property identifiers in
> a serialized format [2].
>
> > Say the structure is
> > /
> > --folderA:nt:folder (propertyX:references fileB)
> > ----fileA:nt:file
> > --fileB:nt:file
> > [...]
> > My question then is how would the database store the uuids or nodes of
> > the structure that is defined above. Very simple structure but to
> > understand how this structure is actually translated to be stored in
> > the database would be helpful.
>
> You'd have four node rows: the root node, folderA, fileA, and fileB.
> The serialized node_data part of the root and folderA nodes would
> contain the node identifiers of the child nodes  (folderA and fileB
> for the root node, and fileA for folderA).
>
> All properties would be stored in the property table. Additionally the
> reference from propertyX to fileB would be stored as a separate refs
> row with the fileB UUID as the node_id value and a serialized property
> identifier list that contains just the propertyX identifier as the
> refs_data value.
>
> I hope this description helps. Note that this only applies to the
> traditional database persistence managers. The new bundle persistence
> managers in Jackrabbit 1.3 work a bit differently, though the same
> identifier->data structure is still in use.
>
> BR,
>
> Jukka Zitting
>
> [1] http://jackrabbit.apache.org/api/1.2.1/org/apache/jackrabbit/core/persistence/PersistenceManager.html
> [2] http://jackrabbit.apache.org/api/1.2.1/org/apache/jackrabbit/core/persistence/util/Serializer.html
>

Re: understanding jackrabbit datastorage

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 4/27/07, Stefan Kurla <st...@gmail.com> wrote:
> I guess this is more suited for the dev list.

Yep.

> How is the data actually stored in jackrabbit say using mysql for
> example and we are just using the default workspace.

A good starting point in understanding the underlying storage model of
Jackrabbit is to look at the PersistenceManager interface [1]. The
actual physical storage model depends on the persistence manager
implementation you are using, but the logical model is fixed by the
interface.

The PersistenceManager abstraction essentially treats all nodes and
properties as individually addressable items that each have their own
unique identifier. In addition to these items the interface also
defines a mechanism to store and access all the references pointing to
a node.

> There is the default_binval which has binval_id and binval_data.
> ### Is this table used to store binary data, where binval_id is the
> uuid of the jcr:content that this is referring to and binval_data is
> the actual bytestream blob data

Yes, the binval table stores binary properties when the externalBLOBs
configuration option is set to "false".

The binval_id column contains the property identifier plus value index
(because of multivalued properties) used to identify the binary value,
and the binval_data column contains the actual byte stream.

> There is default_node which has node_id and node_data.
> ###How is this used?

The node_id column contains the unique node identifier and the
node_data column contains the node state in a serialized format [2].

> default_prop with prop_id and prop_data
> ###How is this used?

The prop_id column contains the property identifier, and the prop_data
column contains the property state in a serialized format [2].

> default_refs with node_id and refs_data
> ###How is this used?

The node_id contains the identifier of the reference target node, and
the refs_data contains the list of referencing property identifiers in
a serialized format [2].

> Say the structure is
> /
> --folderA:nt:folder (propertyX:references fileB)
> ----fileA:nt:file
> --fileB:nt:file
> [...]
> My question then is how would the database store the uuids or nodes of
> the structure that is defined above. Very simple structure but to
> understand how this structure is actually translated to be stored in
> the database would be helpful.

You'd have four node rows: the root node, folderA, fileA, and fileB.
The serialized node_data part of the root and folderA nodes would
contain the node identifiers of the child nodes  (folderA and fileB
for the root node, and fileA for folderA).

All properties would be stored in the property table. Additionally the
reference from propertyX to fileB would be stored as a separate refs
row with the fileB UUID as the node_id value and a serialized property
identifier list that contains just the propertyX identifier as the
refs_data value.

I hope this description helps. Note that this only applies to the
traditional database persistence managers. The new bundle persistence
managers in Jackrabbit 1.3 work a bit differently, though the same
identifier->data structure is still in use.

BR,

Jukka Zitting

[1] http://jackrabbit.apache.org/api/1.2.1/org/apache/jackrabbit/core/persistence/PersistenceManager.html
[2] http://jackrabbit.apache.org/api/1.2.1/org/apache/jackrabbit/core/persistence/util/Serializer.html