You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Jukka Zitting <ju...@gmail.com> on 2007/04/23 20:30:40 UTC

NGP: Value records

Hi,

I started prototyping the next generation persistence proposal
discussed before, and would like feedback on an idea on how to store
values in this persistence model.

My idea is to store each value in a unique and immutable "value
record" identified by a "value identifier". Duplicate values are only
stored once in a single value record. This saves space especially when
storing multiple copies of large binary documents and allows value
equality comparisons based on just the identifiers.

A value record would essentially be an array of bytes as defined in
Value.getStream(). In other words the integer value 123 and the string
value "123" would both be stored in the same value record. More
specific typing information would be indicated in the property record
that refers to that value. For example an integer property and a
string property could both point to the same value record, but have
different property types that indicate the default interpretation of
the value.

Name and path values are stored as strings using namespace prefixes
from an internal namespace registry. Stability of such values is
enforced by restricting this internal namespace registry to never
remove or modify existing prefix mappings, only new namespace mappings
can be added.

Possible Optimizations

Extra metadata can be associated with the value records to avoid
having to parse the binary stream every time the value is accessed as
a typed value. Such metadata could include for example flags that
indicate if the byte array is valid UTF-8 and if it can be interpreted
as an integer, a float, a date, a name, a path, etc. Value records
that can be interpreted as types like integers or dates can also
contain a more efficient binary representation than the string-based
Value.getStream() byte array.

Storing values in separate value records violates locality-of-access
and can in the worst case cause separate disk loads for each value
being read. Since value records are immutable it is possible to offset
this problem by caching commonly accessed values in memory or by
putting copies of small values near places where the values are
referenced. For example a multivalued integer property could
internally be stored as an array that contains both the value
identifiers and the actual integers.

Achieving uniqueness of the value records requires a way to determine
whether an instance of a given value already exists. Some indexing is
needed to avoid having to traverse the entire set of existing value
records for each new value being created. A hash table with chained
entries could easily be managed in an append-only mode for easy
integration with the proposed revision model.

Draft interfaces

Here's a quick draft of the interfaces for such an implementation:

    interface ValueIdentifier {}

    interface ValueRecord {
        InputStream getStream() throws IOException;
    }

    interface Revision {
        /** Returns the identified value from this or any previous revision. */
        ValueRecord getValue(ValueIdentifier identifier);
    }

    interface DraftRevision extends Revision {
        /**
         * Returns the value identifier of a value record with the
given contents.
         * If such a record does not already exists, a new one is
created in this
         * revision.
         */
        ValueIdentifier createValue(InputStream stream) throws IOException;
    }

What do you think?

BR,

Jukka Zitting

Re: NGP: Value records

Posted by Tim Kettering <ti...@vivakos.com>.
I've been reading this thread with interest - and one thing I wanted  
to comment on about keeping full version history, and not cleaning up  
any data, this sounds like a good way to allow for anyone to take a  
snapshot of the repository at any period of time from it's creation  
and get an accurate representation.  Or to "roll" back the repository  
to a certain time period.

This may be useful for certain needs...

-tim

On Apr 25, 2007, at 9:55 AM, Jukka Zitting wrote:

> Hi,
>
> On 4/25/07, Alexandru Popescu ☀  
> <th...@gmail.com> wrote:
>> Yes, it is. But not all content is versionable. Is this proposal only
>> for versionable content? If so, I apologize as I have missed this
>> part.
>
> The proposal is for all content, but internally it actually keeps full
> change histories of everything, not just the versionable parts.
>
> BR,
>
> Jukka Zitting


Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/25/07, Alexandru Popescu ☀ <th...@gmail.com> wrote:
> Yes, it is. But not all content is versionable. Is this proposal only
> for versionable content? If so, I apologize as I have missed this
> part.

The proposal is for all content, but internally it actually keeps full
change histories of everything, not just the versionable parts.

BR,

Jukka Zitting

Re: NGP: Value records

Posted by Alexandru Popescu ☀ <th...@gmail.com>.
On 4/25/07, Torgeir Veimo <to...@pobox.com> wrote:
>
> On 25 Apr 2007, at 14:18, Alexandru Popescu ☀ wrote:
>
> >
> > I see. Now my next question is: how do you clean behind? If a value
> > becomes unreferenced will you or will you not clear it?
>
> Isn't the purpose of a versioned repository to keep all old content?
>

Yes, it is. But not all content is versionable. Is this proposal only
for versionable content? If so, I apologize as I have missed this
part.

./alex
--
.w( the_mindstorm )p.

> --
> Torgeir Veimo
> torgeir@pobox.com
>
>
>
>

Re: NGP: Value records

Posted by Torgeir Veimo <to...@pobox.com>.
On 25 Apr 2007, at 14:18, Alexandru Popescu ☀ wrote:

>
> I see. Now my next question is: how do you clean behind? If a value
> becomes unreferenced will you or will you not clear it?

Isn't the purpose of a versioned repository to keep all old content?

-- 
Torgeir Veimo
torgeir@pobox.com




Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/25/07, Alexandru Popescu ☀ <th...@gmail.com> wrote:
> On 4/25/07, Jukka Zitting <ju...@gmail.com> wrote:
> > On 4/25/07, Alexandru Popescu ☀ <th...@gmail.com> wrote:
> > > I see. Now my next question is: how do you clean behind? If a value
> > > becomes unreferenced will you or will you not clear it?
> >
> > It won't be cleared unless the repository gets explicitly recreated
> > using some vacuuming process. I don't think this is a problem since
> > there's only a single copy of any unique value.
>
> So, in case where is solution is used for storying large binary
> values, the benefit of storying a single value with references will be
> quite lost from the perspective of saving storage space.

My main worry about losing storage space are cases where you are
versioning large binaries or managing multiple clones of entire
workspaces. In such cases you could easily reach dozens or even
hundredths of copies of your content, with which even cheap storage
wouldn't help. Having extra unused binaries in the repository is not
nearly as bad.

Additionally, there's a good case to be made for never removing anything.

> Sorry, for asking so many question (and maybe sounding a bit negative)
> but I am having the feeling that there may be some missing things :-).

No problem, this is exactly why I started this thread. I'm quite
certain I'm missing something or at least misjudging some constraints,
so all feedback is very much welcome.

BR,

Jukka Zitting

Re: NGP: Value records

Posted by Alexandru Popescu ☀ <th...@gmail.com>.
On 4/25/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 4/25/07, Alexandru Popescu ☀ <th...@gmail.com> wrote:
> > I see. Now my next question is: how do you clean behind? If a value
> > becomes unreferenced will you or will you not clear it?
>
> It won't be cleared unless the repository gets explicitly recreated
> using some vacuuming process. I don't think this is a problem since
> there's only a single copy of any unique value.
>

So, in case where is solution is used for storying large binary
values, the benefit of storying a single value with references will be
quite lost from the perspective of saving storage space.

Sorry, for asking so many question (and maybe sounding a bit negative)
but I am having the feeling that there may be some missing things :-).

./alex
--
.w( the_mindstorm )p.

> BR,
>
> Jukka Zitting
>

Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/25/07, Alexandru Popescu ☀ <th...@gmail.com> wrote:
> I see. Now my next question is: how do you clean behind? If a value
> becomes unreferenced will you or will you not clear it?

It won't be cleared unless the repository gets explicitly recreated
using some vacuuming process. I don't think this is a problem since
there's only a single copy of any unique value.

BR,

Jukka Zitting

Re: NGP: Value records

Posted by Alexandru Popescu ☀ <th...@gmail.com>.
On 4/25/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 4/25/07, Alexandru Popescu ☀ <th...@gmail.com> wrote:
> > Another possible problem with the shared values approach is that in a
> > concurrent environment accessing these may become a bottleneck as you
> > will almost always need to serialize the access. Considering that
> > reading is now a 2 step op then you will almost always need to
> > synchronize on that access, and so this will lead to serialized access
> > which not fit any concurrent environment.
>
> How is that? I don't think any synchronization would be needed since
> the value records would be immutable. You could even have separate
> processes concurrently reading the same value files.
>

I see. Now my next question is: how do you clean behind? If a value
becomes unreferenced will you or will you not clear it?

./alex
--
.w( the_mindstorm )p.

> BR,
>
> Jukka Zitting
>

Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/25/07, Alexandru Popescu ☀ <th...@gmail.com> wrote:
> Another possible problem with the shared values approach is that in a
> concurrent environment accessing these may become a bottleneck as you
> will almost always need to serialize the access. Considering that
> reading is now a 2 step op then you will almost always need to
> synchronize on that access, and so this will lead to serialized access
> which not fit any concurrent environment.

How is that? I don't think any synchronization would be needed since
the value records would be immutable. You could even have separate
processes concurrently reading the same value files.

BR,

Jukka Zitting

Re: NGP: Value records

Posted by Alexandru Popescu ☀ <th...@gmail.com>.
On 4/25/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 4/25/07, Alexandru Popescu ☀ <th...@gmail.com> wrote:
> > On 4/23/07, Jukka Zitting <ju...@gmail.com> wrote:
> > > My idea is to store each value in a unique and immutable "value
> > > record" identified by a "value identifier". Duplicate values are only
> > > stored once in a single value record. This saves space especially when
> > > storing multiple copies of large binary documents and allows value
> > > equality comparisons based on just the identifiers.
> > > [...]
> >
> > I may be misreading something, but my main concern with this approach
> > is that while minimizing the size of the storage (which is very cheap
> > right now and almost infinite) it has a penalty on the access
> > performance: needing 2 "I/O" operations for reading a value. The
> > caching strategy may address this problem, but even if memory is also
> > cheap it is still limitted. So, while I see this solution fit for
> > cases where huge amounts of duplicate data would be stored, for all
> > the other cases I see it as suboptimal.
>
> Good point. Apart from the space savings my main goal was to have
> short constant-length identifiers that could be used for equality
> comparisons instead of comparing the value contents. This would be
> especially beneficial for things like names and paths and probably
> also other medium-length strings, but I agree that the
> locality-of-access issue should be resolved somehow.
>

Do you mean something like RDBMS IDs?

Another possible problem with the shared values approach is that in a
concurrent environment accessing these may become a bottleneck as you
will almost always need to serialize the access. Considering that
reading is now a 2 step op then you will almost always need to
synchronize on that access, and so this will lead to serialized access
which   not fit any concurrent environment.

./alex
--
.w( the_mindstorm )p.
_____________________________________
  Alexandru Popescu, OSS Evangelist
TestNG/Groovy/AspectJ/WebWork/more...
  Information Queue ~ www.InfoQ.com

> BR,
>
> Jukka Zitting
>

Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/25/07, Alexandru Popescu ☀ <th...@gmail.com> wrote:
> On 4/23/07, Jukka Zitting <ju...@gmail.com> wrote:
> > My idea is to store each value in a unique and immutable "value
> > record" identified by a "value identifier". Duplicate values are only
> > stored once in a single value record. This saves space especially when
> > storing multiple copies of large binary documents and allows value
> > equality comparisons based on just the identifiers.
> > [...]
>
> I may be misreading something, but my main concern with this approach
> is that while minimizing the size of the storage (which is very cheap
> right now and almost infinite) it has a penalty on the access
> performance: needing 2 "I/O" operations for reading a value. The
> caching strategy may address this problem, but even if memory is also
> cheap it is still limitted. So, while I see this solution fit for
> cases where huge amounts of duplicate data would be stored, for all
> the other cases I see it as suboptimal.

Good point. Apart from the space savings my main goal was to have
short constant-length identifiers that could be used for equality
comparisons instead of comparing the value contents. This would be
especially beneficial for things like names and paths and probably
also other medium-length strings, but I agree that the
locality-of-access issue should be resolved somehow.

BR,

Jukka Zitting

Re: NGP: Value records

Posted by Alexandru Popescu ☀ <th...@gmail.com>.
On 4/28/07, Miro Walker <mi...@gmail.com> wrote:
> Alex,
>
> On 4/25/07, Alexandru Popescu ☀ <th...@gmail.com> wrote:
> > I may be misreading something, but my main concern with this approach
> > is that while minimizing the size of the storage (which is very cheap
> > right now and almost infinite) it has a penalty on the access
> > performance: needing 2 "I/O" operations for reading a value. The
> > caching strategy may address this problem, but even if memory is also
> > cheap it is still limitted. So, while I see this solution fit for
> > cases where huge amounts of duplicate data would be stored, for all
> > the other cases I see it as suboptimal.
>
> Hm - not sure I agree with the assumption that storage is
> cheap/infinite. Try dealing with backups / etc on a repository that is
> 50GB in size, then try with 100GB+ - it gets to be a major headache.
> Even with lots of bandwidth, copying 100GB over a WAN can do all sorts
> of nasty things, like crash firewalls, etc. With a versioning
> repository using multiple workspaces, disk space usage can grow
> extremely fast and we're finding we have many GB of data, 90%+ of
> which is duplicates. Something like what Jukka is suggesting would
> help enourmously. I guess it's one of those "depends on the use case"
> things :-)
>

Miro, I do agree with your points. And I do agree with the "depends on
the usa case". Unfortunately, I feel that the current approach may
address this "possible" concern (for which there are some possible
solutions), while raising others for which there is no available
solution (bad performance, possible concurrency bottlenecks). By all
means, I am not saying that this is a totally wrong approach, but I
think there are missing aspects that should be considered upfront then
later.

bests,

./alex
--
.w( the_mindstorm )p.

> miro
>

Re: NGP: Value records

Posted by Miro Walker <mi...@gmail.com>.
Alex,

On 4/25/07, Alexandru Popescu ☀ <th...@gmail.com> wrote:
> I may be misreading something, but my main concern with this approach
> is that while minimizing the size of the storage (which is very cheap
> right now and almost infinite) it has a penalty on the access
> performance: needing 2 "I/O" operations for reading a value. The
> caching strategy may address this problem, but even if memory is also
> cheap it is still limitted. So, while I see this solution fit for
> cases where huge amounts of duplicate data would be stored, for all
> the other cases I see it as suboptimal.

Hm - not sure I agree with the assumption that storage is
cheap/infinite. Try dealing with backups / etc on a repository that is
50GB in size, then try with 100GB+ - it gets to be a major headache.
Even with lots of bandwidth, copying 100GB over a WAN can do all sorts
of nasty things, like crash firewalls, etc. With a versioning
repository using multiple workspaces, disk space usage can grow
extremely fast and we're finding we have many GB of data, 90%+ of
which is duplicates. Something like what Jukka is suggesting would
help enourmously. I guess it's one of those "depends on the use case"
things :-)

miro

Re: NGP: Value records

Posted by Alexandru Popescu ☀ <th...@gmail.com>.
On 4/23/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> I started prototyping the next generation persistence proposal
> discussed before, and would like feedback on an idea on how to store
> values in this persistence model.
>
> My idea is to store each value in a unique and immutable "value
> record" identified by a "value identifier". Duplicate values are only
> stored once in a single value record. This saves space especially when
> storing multiple copies of large binary documents and allows value
> equality comparisons based on just the identifiers.
>
> A value record would essentially be an array of bytes as defined in
> Value.getStream(). In other words the integer value 123 and the string
> value "123" would both be stored in the same value record. More
> specific typing information would be indicated in the property record
> that refers to that value. For example an integer property and a
> string property could both point to the same value record, but have
> different property types that indicate the default interpretation of
> the value.
>

I may be misreading something, but my main concern with this approach
is that while minimizing the size of the storage (which is very cheap
right now and almost infinite) it has a penalty on the access
performance: needing 2 "I/O" operations for reading a value. The
caching strategy may address this problem, but even if memory is also
cheap it is still limitted. So, while I see this solution fit for
cases where huge amounts of duplicate data would be stored, for all
the other cases I see it as suboptimal.

bests,

./alex
--
.w( the_mindstorm )p.
_____________________________________
  Alexandru Popescu, OSS Evangelist
TestNG/Groovy/AspectJ/WebWork/more...
  Information Queue ~ www.InfoQ.com

Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/24/07, Julian Reschke <ju...@gmx.de> wrote:
> Jukka Zitting wrote:
> > A value record would essentially be an array of bytes as defined in
> > Value.getStream(). In other words the integer value 123 and the string
> > value "123" would both be stored in the same value record. More
> > specific typing information would be indicated in the property record
> > that refers to that value. For example an integer property and a
> > string property could both point to the same value record, but have
> > different property types that indicate the default interpretation of
> > the value.
>
> This is possible, but does it really help in the real world? Thus I'd
> see that just as a nice-to-have, and be prepared to take it out if it
> makes things harder in practice...

I guess this decision is mostly a nice-to-have, it simplifies some
value conversions and might help with indexing but most importantly
keeps the value records simple as they are essentially just untyped
byte arrays. I wouldn't mind adding type information if there's a good
reason to do that.

BR,

Jukka Zitting

Re: NGP: Value records

Posted by Julian Reschke <ju...@gmx.de>.
Jukka Zitting wrote:
> Hi,
> 
> I started prototyping the next generation persistence proposal
> discussed before, and would like feedback on an idea on how to store
> values in this persistence model.
> 
> My idea is to store each value in a unique and immutable "value
> record" identified by a "value identifier". Duplicate values are only
> stored once in a single value record. This saves space especially when
> storing multiple copies of large binary documents and allows value
> equality comparisons based on just the identifiers.

...and gives you a cheap strong ETag for binary content.

> A value record would essentially be an array of bytes as defined in
> Value.getStream(). In other words the integer value 123 and the string
> value "123" would both be stored in the same value record. More
> specific typing information would be indicated in the property record
> that refers to that value. For example an integer property and a
> string property could both point to the same value record, but have
> different property types that indicate the default interpretation of
> the value.

This is possible, but does it really help in the real world? Thus I'd 
see that just as a nice-to-have, and be prepared to take it out if it 
makes things harder in practice...

 > ...

Best regards, Julian

Re: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/25/07, Johan Stuyts <j....@hippo.nl> wrote:
> > > Storing all values as a binary field will prevent access to
> > the content
> > > using standard database tools. You won't be able to look at
> > the values
> > > in the database without starting Jackrabbit.
> >
> > I don't consider that to be a design goal. The values would most
> > naturally be stored directly on disk without a database anywhere.
>
> IMHO it is worth to make access to the data using available tools
> (database viewers, database backup tools, reporting engines, schema/data
> migration tools, etc.) possible. If you do not, you have to write a
> custom tool for the NGP model for each purpose.

I would expect such tools to work against the JCR interface instead of
any underlying persistence model. Note that architecturally I consider
the JCR API to be a parallel alternative to JDBC, not something you
necessarily build on top of JDBC.

BR,

Jukka Zitting

RE: Value records

Posted by Johan Stuyts <j....@hippo.nl>.
> > Storing all values as a binary field will prevent access to 
> the content
> > using standard database tools. You won't be able to look at 
> the values
> > in the database without starting Jackrabbit.
> 
> I don't consider that to be a design goal. The values would most
> naturally be stored directly on disk without a database anywhere.

IMHO it is worth to make access to the data using available tools
(database viewers, database backup tools, reporting engines, schema/data
migration tools, etc.) possible. If you do not, you have to write a
custom tool for the NGP model for each purpose.

Johan Stuyts

Re: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/25/07, Johan Stuyts <j....@hippo.nl> wrote:
> > A value record would essentially be an array of bytes as defined in
> > Value.getStream(). In other words the integer value 123 and the string
> > value "123" would both be stored in the same value record. More
>
> Storing all values as a binary field will prevent access to the content
> using standard database tools. You won't be able to look at the values
> in the database without starting Jackrabbit.

I don't consider that to be a design goal. The values would most
naturally be stored directly on disk without a database anywhere.

BR,

Jukka Zitting

RE: Value records

Posted by Johan Stuyts <j....@hippo.nl>.
> A value record would essentially be an array of bytes as defined in
> Value.getStream(). In other words the integer value 123 and the string
> value "123" would both be stored in the same value record. More
> ...

Storing all values as a binary field will prevent access to the content
using standard database tools. You won't be able to look at the values
in the database without starting Jackrabbit.

If the data ever needs to be transformed (migration between application
releases, consistency fixes, etc.) the SQL that has to be written will
be complex. In addition to having to work with vertically stored data,
you now also have to know how to convert a value to its binary
representation. E.g. you cannot work with native SQL dates but have to
work with hexadecimal strings.

Johan Stuyts

Re: NGP: Value records

Posted by Tobias Bocanegra <to...@day.com>.
> > > A value record would essentially be an array of bytes as defined in
> > > Value.getStream(). In other words the integer value 123 and the string
> > > value "123" would both be stored in the same value record. More
> > > specific typing information would be indicated in the property record
> > > that refers to that value. For example an integer property and a
> > > string property could both point to the same value record, but have
> > > different property types that indicate the default interpretation of
> > > the value.
> > i think that with small values we have to keep in mind that the
> > "key" (value identifier) may be bigger than the actual value and of
> > course the additional indirection also has a performance impact.
> > do you think that we should consider a minimum size for value's to
> > key stored in this manner? personally, i think that this might make
> > sense.
>
> For consistency I would use such value records for all values,
> regardless of the value size. I'd like to keep the value identifiers
> as short as possible, optimally just 64 bits, to avoid too much
> storage and bandwidth overhead. The indirection costs could probably
> best be avoided by storing copies of short value contents along with
> the value identifiers where the values are referenced.
>
> > anyway, what key did you have in mind?
> > i would assume some sort of a hash (md5) could be great or is this
> > still more abstract?
>
> I was thinking about something more concrete, like a direct disk
> offset. The value identifier could for example be a 64 bit integer
> with the first 32 bits identifying the revision that contains the
> value and the last 32 bits being the offset of the value record within
> a "value file". I haven't yet calculated whether such a scheme gives
> us a large enough identifier space.
>
i would use MD5 of the contents as keys...so your search for
dublicates is very cheep. and i would not use a value record for small
values. eg; the overhead of storing a 'boolean' is just too big.
considering you have 1mio nodes, with every node having a
'isCheckechedOut' property.... or a lastModified, whitch is never the
same.

regard, toby
-- 
-----------------------------------------< tobias.bocanegra@day.com >---
Tobias Bocanegra, Day Management AG, Barfuesserplatz 6, CH - 4001 Basel
T +41 61 226 98 98, F +41 61 226 98 97
-----------------------------------------------< http://www.day.com >---

Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 4/24/07, David Nuescheler <da...@gmail.com> wrote:
> > A value record would essentially be an array of bytes as defined in
> > Value.getStream(). In other words the integer value 123 and the string
> > value "123" would both be stored in the same value record. More
> > specific typing information would be indicated in the property record
> > that refers to that value. For example an integer property and a
> > string property could both point to the same value record, but have
> > different property types that indicate the default interpretation of
> > the value.
> i think that with small values we have to keep in mind that the
> "key" (value identifier) may be bigger than the actual value and of
> course the additional indirection also has a performance impact.
> do you think that we should consider a minimum size for value's to
> key stored in this manner? personally, i think that this might make
> sense.

For consistency I would use such value records for all values,
regardless of the value size. I'd like to keep the value identifiers
as short as possible, optimally just 64 bits, to avoid too much
storage and bandwidth overhead. The indirection costs could probably
best be avoided by storing copies of short value contents along with
the value identifiers where the values are referenced.

> anyway, what key did you have in mind?
> i would assume some sort of a hash (md5) could be great or is this
> still more abstract?

I was thinking about something more concrete, like a direct disk
offset. The value identifier could for example be a 64 bit integer
with the first 32 bits identifying the revision that contains the
value and the last 32 bits being the offset of the value record within
a "value file". I haven't yet calculated whether such a scheme gives
us a large enough identifier space.

> > Name and path values are stored as strings using namespace prefixes
> > from an internal namespace registry. Stability of such values is
> > enforced by restricting this internal namespace registry to never
> > remove or modify existing prefix mappings, only new namespace mappings
> > can be added.
> sounds good, i assume that the "internal" namespace registry gets
> its initial prefix mappings from the "public" namespace registry?
> i think having the same prefixes could be beneficial since remappings
> and removals are very rare even in the public registry and this would
> allow us to optimize the more typical case even better.

Exactly. In most cases, like when using the standard JCR prefix
mappings, the stored name and path values can be passed as-is through
the JCR API.

> > Achieving uniqueness of the value records requires a way to determine
> > whether an instance of a given value already exists. Some indexing is
> > needed to avoid having to traverse the entire set of existing value
> > records for each new value being created.
> i agree and i think we have to make sure that the overhead
> of calculating the key (value identifier) is reasonable, so
> "insert performance" doesn't suffer too much.

Note that the "value key" can well be different from the value
identifier. I was thinking of using something like a standard (and
fast) CRC code as the hash key for looking up potential matches. For
large binaries we could also calculate a SHA checksum to avoid having
to read through the entire byte stream when checking for equality. For
short values the CRC coupled with an exact byte comparison should be
good enough.

> i could even see an asynchronous model that "inlines" values
> of all sizes initially and then leaves it up to some sort of garbage
> collection job to "extract" the large values and stores them as
> immutable value records...
> this could preserve "insert performance" and allows to benefit from
> efficient operations for things like copy, clone, etc and of course the
> space consumption benefits.

I would be ready to trade some insert performance for more
consistency, but let's see how much the cost would be in practice.

BR,

Jukka Zitting

Re: NGP: Value records

Posted by David Nuescheler <da...@gmail.com>.
hi jukka,

i am very much in favor of such an approach.

> My idea is to store each value in a unique and immutable "value
> record" identified by a "value identifier". Duplicate values are only
> stored once in a single value record. This saves space especially when
> storing multiple copies of large binary documents and allows value
> equality comparisons based on just the identifiers.
this sounds great for large (binary and string) property values.

> A value record would essentially be an array of bytes as defined in
> Value.getStream(). In other words the integer value 123 and the string
> value "123" would both be stored in the same value record. More
> specific typing information would be indicated in the property record
> that refers to that value. For example an integer property and a
> string property could both point to the same value record, but have
> different property types that indicate the default interpretation of
> the value.
i think that with small values we have to keep in mind that the
"key" (value identifier) may be bigger than the actual value and of
course the additional indirection also has a performance impact.
do you think that we should consider a minimum size for value's to
key stored in this manner? personally, i think that this might make
sense.
anyway, what key did you have in mind?
i would assume some sort of a hash (md5) could be great or is this
still more abstract?

> Name and path values are stored as strings using namespace prefixes
> from an internal namespace registry. Stability of such values is
> enforced by restricting this internal namespace registry to never
> remove or modify existing prefix mappings, only new namespace mappings
> can be added.
sounds good, i assume that the "internal" namespace registry gets
its initial prefix mappings from the "public" namespace registry?
i think having the same prefixes could be beneficial since remappings
and removals are very rare even in the public registry and this would
allow us to optimize the more typical case even better.

> Achieving uniqueness of the value records requires a way to determine
> whether an instance of a given value already exists. Some indexing is
> needed to avoid having to traverse the entire set of existing value
> records for each new value being created.
i agree and i think we have to make sure that the overhead
of calculating the key (value identifier) is reasonable, so
"insert performance" doesn't suffer too much.
i could even see an asynchronous model that "inlines" values
of all sizes initially and then leaves it up to some sort of garbage
collection job to "extract" the large values and stores them as
immutable value records...
this could preserve "insert performance" and allows to benefit from
efficient operations for things like copy, clone, etc and of course the
space consumption benefits.

so i guess in short i would be in favor of a value mechanism that can
handle transparently both (a) "inline" the values without using extra
indirection (for small values or quickly inserted one) and
(b) immutable value records.

just my two cents.

regards,
david

Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 6/6/07, Stefan Guggisberg <st...@gmail.com> wrote:
> something that just crossed my mind: i know a number of people
> want to store everything (config, meta data, binaries and content)
> in the same db in order to allow easy backup/restore of an entire
> repository. currently they can do so by using DatabaseFileSystem
> and the externalBLOBs=false option of DatabasePersistenceManager.
>
> do you plan to support db persistence for the binary store as well?

Yes. The DataStore interface should be very database friendly so there
shouldn't be any issues in implementing DB persistence for binaries. A
schema like this should do the trick:

    CREATE TABLE datastore ( id SERIAL, sha1 CHAR(40), data BLOB );

The getRecord() method would essentially be:

    SELECT data FROM datastore WHERE sha1=?

The addRecord() method would essentially be:

    INSERT INTO datastore (data) VALUES (?) -- calculate sh1 while inserting
    IF (SELECT 1 FROM datastore WHERE sha1=?) THEN
        DELETE FROM datastore WHERE id=?
    ELSE
        UPDATE datastore SET sha1=? WHERE id=?
    END IF
    COMMIT
    RETURN sha1

BR,

Jukka Zitting

Re: NGP: Value records

Posted by Stefan Guggisberg <st...@gmail.com>.
On 6/6/07, Stefan Guggisberg <st...@gmail.com> wrote:
> hi jukka,
>
> On 6/5/07, Jukka Zitting <ju...@gmail.com> wrote:
> > Hi,
> >
> > On 5/16/07, Jukka Zitting <ju...@gmail.com> wrote:
> > > On 5/12/07, Jukka Zitting <ju...@gmail.com> wrote:
> > > > Based on the feedback I agree that it probably doesn't make sense to
> > > > keep track of unique copies of all values. However, avoiding extra
> > > > copies of large binaries is still a very nice feature, so I'd still
> > > > like to keep the single copy idea for those values. This is in fact
> > > > something that we might want to consider already for Jackrabbit 1.4
> > > > regardless of what we'll do with the NGP proposal.
> > >
> > > See JCR-926 for a practical application of this idea to current Jackrabbit.
> >
> > I just did a quick prototype where I made the InternalValue class turn
> > all incoming binary streams into data records using a global data
> > store. Internally the value would just be represented by the data
> > identifier.
> >
> > This allowed me to simplify quite a few things (for example to drop
> > all BLOBStore classes and custom handling of binary properties) and to
> > achieve *major* performance improvements for cases where large (>
> > 100kB) binaries are handled. For example the time to save a large file
> > was essentially cut in half and things like versioning or cloning
> > trees with large binaries would easily become faster by an order of
> > magnitude. With this change it is possible for example to copy a DVD
> > image file in milliseconds. What's even better, not only did this
> > change remove extra copying of binary values, it also pushed all
> > binaries out of the persistence or item state managers so that no
> > binary read or write operation would ever lock the repository!
>
> awesome, that's great news!
>
> is there a way to purge the binary store, i.e. remove unreferenced data?
> i am a bit concerned that doing a lot of add/remove operations would
> quickly exhaust available storage space. at least we need a concept
> how deal with this kind of situation.

something that just crossed my mind: i know a number of people
want to store everything (config, meta data, binaries and content)
in the same db in order to allow easy backup/restore of an entire
repository. currently they can do so by using DatabaseFileSystem
and the externalBLOBs=false option of DatabasePersistenceManager.

do you plan to support db persistence for the binary store as well?

cheers
stefan

>
> >
> > The downside of the change is that it requires backwards-incompatible
> > changes in jackrabbit-core, most notably pulling all blob handling out
> > of the existing persistence managers. Adopting the data store concept
> > would thus require migration of all existing repositories. Luckily
> > such migration would likely be relatively straightforward and we could
> > write tools to simplify the upgrade, but it would still be a major
> > undertaking.
> >
> > I would very much like to go forward with this approach, but I'm not
> > sure when would be the right time to do that. Should we target already
> > the 1.4 release in September/October, or would it be better to wait
> > for Jackrabbit 2.0 sometime next year? Alternatively, should we go for
> > a 2.0 release already this year with this and some other structural
> > changes, and have Jackrabbit 3.0 be the JSR 283 reference
> > impelementation?
>
> since the jsr-283 public review is just around the corner we'll have to
> start work on the ri pretty soon. therefore i think the ri should target
> v2.0.
>
> wrt intergating JCR-926 both 1.4 and 2.0 would be fine with me.
>
> cheers
> stefan
>
> >
> > BR,
> >
> > Jukka Zitting
> >
>

Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 6/6/07, Stefan Guggisberg <st...@gmail.com> wrote:
> is there a way to purge the binary store, i.e. remove unreferenced data?
> i am a bit concerned that doing a lot of add/remove operations would
> quickly exhaust available storage space. at least we need a concept
> how deal with this kind of situation.

I was thinking of using a garbage collector to reclaim unreferenced
data. I haven't yet implemented anything like that and there will be
open issues to resolve if we want to allow more than one repository to
use the same data store for binaries, but I don't think that this is a
question that we couldn't resolve. I'll look into more details on
this. I think it would be good if we had at least a rudimentary
solution available before releasing any of this code.

> since the jsr-283 public review is just around the corner we'll have to
> start work on the ri pretty soon. therefore i think the ri should target
> v2.0.
>
> wrt intergating JCR-926 both 1.4 and 2.0 would be fine with me.

OK. We can initially target 1.4 but if it seems like we won't have the
required migration tools and other supporting code and documentation,
then we can postpone this to 2.0.

BR,

Jukka Zitting

Re: NGP: Value records

Posted by Stefan Guggisberg <st...@gmail.com>.
hi jukka,

On 6/5/07, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On 5/16/07, Jukka Zitting <ju...@gmail.com> wrote:
> > On 5/12/07, Jukka Zitting <ju...@gmail.com> wrote:
> > > Based on the feedback I agree that it probably doesn't make sense to
> > > keep track of unique copies of all values. However, avoiding extra
> > > copies of large binaries is still a very nice feature, so I'd still
> > > like to keep the single copy idea for those values. This is in fact
> > > something that we might want to consider already for Jackrabbit 1.4
> > > regardless of what we'll do with the NGP proposal.
> >
> > See JCR-926 for a practical application of this idea to current Jackrabbit.
>
> I just did a quick prototype where I made the InternalValue class turn
> all incoming binary streams into data records using a global data
> store. Internally the value would just be represented by the data
> identifier.
>
> This allowed me to simplify quite a few things (for example to drop
> all BLOBStore classes and custom handling of binary properties) and to
> achieve *major* performance improvements for cases where large (>
> 100kB) binaries are handled. For example the time to save a large file
> was essentially cut in half and things like versioning or cloning
> trees with large binaries would easily become faster by an order of
> magnitude. With this change it is possible for example to copy a DVD
> image file in milliseconds. What's even better, not only did this
> change remove extra copying of binary values, it also pushed all
> binaries out of the persistence or item state managers so that no
> binary read or write operation would ever lock the repository!

awesome, that's great news!

is there a way to purge the binary store, i.e. remove unreferenced data?
i am a bit concerned that doing a lot of add/remove operations would
quickly exhaust available storage space. at least we need a concept
how deal with this kind of situation.

>
> The downside of the change is that it requires backwards-incompatible
> changes in jackrabbit-core, most notably pulling all blob handling out
> of the existing persistence managers. Adopting the data store concept
> would thus require migration of all existing repositories. Luckily
> such migration would likely be relatively straightforward and we could
> write tools to simplify the upgrade, but it would still be a major
> undertaking.
>
> I would very much like to go forward with this approach, but I'm not
> sure when would be the right time to do that. Should we target already
> the 1.4 release in September/October, or would it be better to wait
> for Jackrabbit 2.0 sometime next year? Alternatively, should we go for
> a 2.0 release already this year with this and some other structural
> changes, and have Jackrabbit 3.0 be the JSR 283 reference
> impelementation?

since the jsr-283 public review is just around the corner we'll have to
start work on the ri pretty soon. therefore i think the ri should target
v2.0.

wrt intergating JCR-926 both 1.4 and 2.0 would be fine with me.

cheers
stefan

>
> BR,
>
> Jukka Zitting
>

Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 5/16/07, Jukka Zitting <ju...@gmail.com> wrote:
> On 5/12/07, Jukka Zitting <ju...@gmail.com> wrote:
> > Based on the feedback I agree that it probably doesn't make sense to
> > keep track of unique copies of all values. However, avoiding extra
> > copies of large binaries is still a very nice feature, so I'd still
> > like to keep the single copy idea for those values. This is in fact
> > something that we might want to consider already for Jackrabbit 1.4
> > regardless of what we'll do with the NGP proposal.
>
> See JCR-926 for a practical application of this idea to current Jackrabbit.

I just did a quick prototype where I made the InternalValue class turn
all incoming binary streams into data records using a global data
store. Internally the value would just be represented by the data
identifier.

This allowed me to simplify quite a few things (for example to drop
all BLOBStore classes and custom handling of binary properties) and to
achieve *major* performance improvements for cases where large (>
100kB) binaries are handled. For example the time to save a large file
was essentially cut in half and things like versioning or cloning
trees with large binaries would easily become faster by an order of
magnitude. With this change it is possible for example to copy a DVD
image file in milliseconds. What's even better, not only did this
change remove extra copying of binary values, it also pushed all
binaries out of the persistence or item state managers so that no
binary read or write operation would ever lock the repository!

The downside of the change is that it requires backwards-incompatible
changes in jackrabbit-core, most notably pulling all blob handling out
of the existing persistence managers. Adopting the data store concept
would thus require migration of all existing repositories. Luckily
such migration would likely be relatively straightforward and we could
write tools to simplify the upgrade, but it would still be a major
undertaking.

I would very much like to go forward with this approach, but I'm not
sure when would be the right time to do that. Should we target already
the 1.4 release in September/October, or would it be better to wait
for Jackrabbit 2.0 sometime next year? Alternatively, should we go for
a 2.0 release already this year with this and some other structural
changes, and have Jackrabbit 3.0 be the JSR 283 reference
impelementation?

BR,

Jukka Zitting

Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On 5/12/07, Jukka Zitting <ju...@gmail.com> wrote:
> Based on the feedback I agree that it probably doesn't make sense to
> keep track of unique copies of all values. However, avoiding extra
> copies of large binaries is still a very nice feature, so I'd still
> like to keep the single copy idea for those values. This is in fact
> something that we might want to consider already for Jackrabbit 1.4
> regardless of what we'll do with the NGP proposal.

See JCR-926 for a practical application of this idea to current Jackrabbit.

BR,

Jukka Zitting

Re: NGP: Value records

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

Thanks for the comments so far on this topic! I've been thinking about
this a bit more and I now have the second iteration ready for review.
Read on...

Based on the feedback I agree that it probably doesn't make sense to
keep track of unique copies of all values. However, avoiding extra
copies of large binaries is still a very nice feature, so I'd still
like to keep the single copy idea for those values. This is in fact
something that we might want to consider already for Jackrabbit 1.4
regardless of what we'll do with the NGP proposal.

The idea is to keep all binary values (I guess it's easier to manage
things by value type than by value size) in a global binary store that
keeps only a single copy of any unique binary stream. Binary values
are stored in the global store as soon as they are received from the
client (for example ValueFactory.createValue(InputStream)) and only a
resulting value identifier is kept as a reference to the binary
stream.

The binary store persists all received values immediately and never
modifies or removes (unless there's an explicit garbage collection
process) stored binaries. This allows the binary store to exist
outside any transaction scopes, and it can also be concurrently
accessed by any number of cluster nodes or other processes. Even
completely separate content repositories could share the binary store.

BR,

Jukka Zitting