You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Thomas Mueller <th...@gmail.com> on 2007/06/11 18:37:56 UTC

Re: Next Generation Persistence

Hi,

It sounds like MVCC (multi version concurrency control) in databases.
Question: in your view, do we need to change the PersistenceManager
API as well?

* Removing nodes *
Last section: when removing B, the child C is removed as well. Is it
important to say 'remove C', and not 'remove B including children' in
the revision? What would happen if another session would add a child D
to C in the meantime, and commit this change? If there is no locking,
how to do large deletes / updates?

* Revision scope *
In my opinion, the scope should be the entire repository.

* Base Revision *
The text talks about 'the base revision of the session' (can be
updated in some cases). But to support Item.refresh, a session needs
to keep multiple base revisions: one for the session, and one for each
refreshed Item (including children)?

> The base revision of a session can optionally be changed when more recent revisions are persisted during the session lifetime.
Does 'optionally' mean it is a setting of the session? Is there a JCR
API feature to set this option?

* Persisting a Subtree *
> If the operation fails, then the two new revisions are discarded and no changes are made to the session.
Item.save says: 'If validation fails, then no pending changes are
saved and they remain recorded on the Session'

* Workspace Operations *
> If the operation succeeds, the session is updated to use the persisted revision as the new base revision.
I think the base revision of the session should not be updated in this case.

* Transactions *
> This model can also easily support two-phase commits in a distributed transaction.
I agree, two-phase commit is no problem. However XAResource.recover
(obtains the list of prepared transactions) is tricky to implement. As
far as I know, is currently not supported. If we want it in the
future, we better think early how to implement it. I suggest we
describe the problem and possible solutions, but wait with the
implementation until required.

* Namespace and Node Type Management *
Namespace and node type management in jcr:system: Good idea! However
without custom data structures it will be slow (if jcr:system subtree
is read whenever a namespace is resolved). What about custom data
structures that cache the latest state (and listen for changes to the
relevant subtree)? I guess only the latest version is relevant, or is
there a situation where older versions of node types / namespaces are
required? If yes, things will be complicated.

* Internal Data Structures *
I don't fully understand this section. I think this section should be
extended. In databases, the approach is usually:

A: There is a main store (where the 'base' revision of items are kept,
indexed by item).
B: Committed revisions are stored sequentially in the redo log. Can
only be read sequentially.
C: Draft and old revisions are mainly kept in-memory (saved to disk
only if no space).
D: Each session keeps an undo log for uncommitted changes.
E: Committed revisions are persisted in the main store, and if session
references an older revision of the same item, an in-memory copy is
made (copy on write, see C).

Of course we don't need to rebuild a database, but maybe reuse some ideas.

* Combined Revisions *
I don't understand why to do that, but probably because you have
different internal data structures in mind.

When to send large objects to the server: There are two use cases:
- Client is far away: keep changes on the client as long as possible,
send batches
- Client is close by: avoid temporary copies of large (binary) data on
the client
Both cases should be supported, but which one is more important?

Thomas

Re: Next Generation Persistence

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 6/11/07, Thomas Mueller <th...@gmail.com> wrote:
> It sounds like MVCC (multi version concurrency control) in databases.

Yes, MVCC was one of my inspirations (see [1]), although I must admit
that I only know MVCC on a high level.

[1] http://article.gmane.org/gmane.comp.apache.jackrabbit.devel/10642/match=mvcc

> Question: in your view, do we need to change the PersistenceManager
> API as well?

I think that's inevitable.

> * Removing nodes *
> Last section: when removing B, the child C is removed as well. Is it
> important to say 'remove C', and not 'remove B including children' in
> the revision?

I think the essential point to consider is the referenceability of the
nodes. We probably need to explicitly mark all referenceable nodes as
deleted, but I think other nodes can be handled just by marking the
root of the non-referenceable subtree as deleted.

> What would happen if another session would add a child D
> to C in the meantime, and commit this change?

During commit the other session would notice that intervening changes
have occurred and would update the local changes to match the latest
state of the repository. This update process should validate paths at
least up to the closest referenceable ancestor node.

> If there is no locking, how to do large deletes / updates?

My idea is to have all synchronization at the point when transient
changes are persisted. See the flowchart in the "Draft revision"
section.

The time required to validate a large delete/update could become quite
high and such changes could easily end up invalidating the transient
changes of a number of concurrent sessions, but I guess similar
concerns apply regardless of how large changes are handled.

> * Revision scope *
> In my opinion, the scope should be the entire repository.

I tend to agree after considering the alternatives.

> * Base Revision *
> The text talks about 'the base revision of the session' (can be
> updated in some cases). But to support  Item.refresh, a session needs
> to keep multiple base revisions: one for the session, and one for each
> refreshed Item (including children)?

The spec doesn't specify whether or when the session state gets
refreshed so I think we could implement Item.refresh like this:

    public void refresh(boolean keepChanges) throws ...{
        if (!keepChanges) {
            // drop transient changes for this item and descendants
        }
        getSession().refresh(true);
    }

This way we could keep a single consistent base revision for the entire session.

> > The base revision of a session can optionally be changed when more
> > recent revisions are persisted during the session lifetime.
> Does 'optionally' mean it is a setting of the session?

The "auto-refresh" mode could be set either for the entire repository
or for an individual session.

> Is there a JCR API feature to set this option?

No, the spec doesn't really specify when or even if session state gets
refreshed between explicit refresh() calls.

We could for example add a custom JackrabbitSession.setAutoRefresh()
method or allow the option to be specified during login like this:

    Session session = repository.login("workspace?refresh=auto");

> * Persisting a Subtree *
> > If the operation fails, then the two new revisions are discarded and
> > no changes are made to the session.
> Item.save says: 'If validation fails, then no pending changes are
> saved and they remain recorded on the Session'

My intention was that the original state of the session, including the
original draft revision, is not changed if the operation fails. To
clarify:

Let's represent the global state of the repository as R = R(X), where
X is the last persisted revision, and the local state of a session as
S = S(B, D), where B is the base revision and D the draft revision of
the session.

Now, say that D contains subtree operations P and some other changes
Q, i.e. D = P + Q and intersect(P, Q) = 0. Then persisting the subtree
operations would result in state changes R' = R(X + P) and S' = S(X +
P, Q) assuming that the update operation B => X and subsequent
validation of P succeeds. If the operation fails, then the session
state isn't changed, S' = S = S(B, D).

I hope I didn't get too complex there...

> * Workspace Operations *
> > If the operation succeeds, the session is updated to use the persisted
> > revision as the new base revision.
> I think the base revision of the session should not be updated in this case.

I think it would be very confusing for a client to do a workspace move
or a copy and then not to be able to see the changes before an
explicit refresh.

As discussed for Item.refresh, I think we have quite a lot of freedom
in deciding when and how the session state gets refreshed.

> * Transactions *
> > This model can also easily support two-phase commits in a distributed
> > transaction.
> I agree, two-phase commit is no problem. However XAResource.recover
> (obtains the list of prepared transactions) is tricky to implement. As
> far as I know, is currently not supported. If we want it in the
> future, we better think early how to implement it. I suggest we
> describe the problem and possible solutions, but wait with the
> implementation until required.

You're right, we currently have nothing like that. I definitely don't
understand all the fine points in XA, but assuming we keep all
relevant information persisted within a revision I think it should be
possible to implement fairly complex recovery mechanisms.

> * Namespace and Node Type Management *
> Namespace and node type management in jcr:system: Good idea! However
> without custom data structures it will be slow (if jcr:system subtree
> is read whenever a namespace is resolved). What about custom data
> structures that cache the latest state (and listen for changes to the
> relevant subtree)? I guess only the latest version is relevant, or is
> there a situation where older versions of node types / namespaces are
> required? If yes, things will be complicated.

Caching the information in custom data structures would definitely
make sense. We even persist those structures as a part of the revision
whenever namespace or node type changes are made, and perhaps include
a reference to the last revision where such "schema" changes were made
in each persisted revision. A session could then access the latest
cache simply by following that reference. This way we wouldn't need
any observation listeners and we could support concurrent repository
views with different schema versions.

> * Internal Data Structures *
> I don't fully understand this section. I think this section should be
> extended.

Agreed. :-) I have some vague ideas on how we could best persist and
access the revisions data, but we're still far from even working
prototypes. I included the data structure section in the proposal
mostly to record some of my early ideas and to make it clear that this
is probably the area that will need most work...

Having someone with solid database background commenting is much
appreciated. :-)

> In databases, the approach is usually:
>
> A: There is a main store (where the 'base' revision of items are kept,
> indexed by item).
> B: Committed revisions are stored sequentially in the redo log. Can
> only be read sequentially.
> C: Draft and old revisions are mainly kept in-memory (saved to disk
> only if no space).
> D: Each session keeps an undo log for uncommitted changes.
> E: Committed revisions are persisted in the main store, and if session
> references an older revision of the same item, an in-memory copy is
> made (copy on write, see C).
>
> Of course we don't need to rebuild a database, but maybe reuse some ideas.

Exactly! I would be very interested in exploring ways to somehow merge
the best practices from the row-based database world and the way for
example Subversion
incrementally stores a tree structure. See also Tobias' notes on this.

Again, I must confess that I'm not the best expert in data structures
for low-level persistence as I've spent most of my career working
higher up on the software stack, mostly building stuff on top of SQL,
HTTP, and nowadays JCR.

> * Combined Revisions *
> I don't understand why to do that, but probably because you have
> different internal data structures in mind.

I'm coming from a high-end view where there are no globally indexed
set of "base" versions of the stored items, and content is instead
represented as a sequential set of revision that each contain a change
set against previous revisions. In systems like Subversion such
storage model works fine as everything is accessed hierarchically, but
as soon as referenceability (with back-references!) and search
features are included I fear there will be problems with such an
approach. The main motivation for combining (old) revisions is to
improve the efficiency of indexing.

> When to send large objects to the server: There are two use cases:
> - Client is far away: keep changes on the client as long as possible,
> send batches
> - Client is close by: avoid temporary copies of large (binary) data on
> the client
> Both cases should be supported, but which one is more important?

Agreed. I think we can (at least for now) focus on a local client as
the SPI work is already trying to solve the remote access and batching
mechanisms. I guess we eventually need to produce some sort of
synthesis of SPI and NGP (or another future persistence architecture),
but I guess that would be something for Jackrabbit 4.0 or 5.0...

BR,

Jukka Zitting

Re: Next Generation Persistence

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 6/14/07, Thomas Mueller <th...@gmail.com> wrote:
> I like to better understand the reasons for NGP. I found the following
> issues in JIRA, but I think most of those problems can be solved even
> without NGP. Are there any other issues to consider (and issues
> without JIRA entry)?

The reason I started drafting the NGP proposal was to come up with an
architectural roadmap for something like the next five years in
Jackrabbit. The current architecture (as summarized in [1] and [2])
already has quite a lot of history behind it and I think we are seeing
some new use cases that the current architecture doesn't support very
naturally. For example the clustering feature is more like an add-on
than an integral part of the repository. Another example is bundle
persistence that has to go through extra trouble (has it's own caches,
converts between bundles and item states, etc.) to fit the
PersistenceManager API.

[1] http://jackrabbit.apache.org/doc/arch/overview.html
[2] http://jackrabbit.apache.org/doc/arch/operate/index.html

The actual model I came up with, NGP, is just one possible approach,
based mainly on the following goals:

* Focus on read performance, i.e. massive concurrency (no locking) and
caching (no invalidation) opportunities for typical read loads
* Transactions and clustering as core features, with no need for
external databases
* Hot backup and point-in-time recovery as core features

I'm open to debating whether these really are the goals that we want to achieve.

Another important reason why I wanted to start this kind of an
architectural discussion is that it'll open us to new kinds of
solutions that we can start implementing already in the current
architecture; the data store concept being a prime example of such
progress.

> What do you think about using the same connection for versioning and
> regular access? I know it requires refactoring, and a new setting in
> repository.xml. Anything else?

This is another case where the current architecture limits our
options. The persistence model is best suited for a case where each
workspace and the versioning store have their own independent backend
stores. It would be interesting to explore how far we can go with
incremental changes.

BR,

Jukka Zitting

Re: Next Generation Persistence

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 6/18/07, Thomas Mueller <th...@gmail.com> wrote:
> I am currently thinking about a radical solution (you can call it
> brainstorming): Store the index in a database. This would probably be
> slower than Lucene, a big disadvantage. Additionally, store data of
> all workspaces in the same database, using the same connection. Store
> versions in the same database. Use one database connection per
> session. Don't use any caching in Jackrabbit (only the database would
> cache data). Like this you don't run into problems with transaction
> rollbacks, you don't need to clean up the index from time to time. The
> Jackrabbit core would get much simpler. The database would not really
> need to be a SQL database, but it would need a bit more functionality
> than the current PersistenceManager.

Optimally I'd like to see Jackrabbit *be* that database; search
indexes, caching, transactions and all. The hard part is getting to
that state without a major revolution in Jackrabbit internals, so I
quite like your line of thinking in that it seems to offer a more
evolutionary path towards that goal. :-)

BR,

Jukka Zitting

Re: Next Generation Persistence

Posted by Thomas Mueller <th...@gmail.com>.

Hi,

I am currently thinking about a radical solution (you can call it
brainstorming): Store the index in a database. This would probably be
slower than Lucene, a big disadvantage. Additionally, store data of
all workspaces in the same database, using the same connection. Store
versions in the same database. Use one database connection per
session. Don't use any caching in Jackrabbit (only the database would
cache data). Like this you don't run into problems with transaction
rollbacks, you don't need to clean up the index from time to time. The
Jackrabbit core would get much simpler. The database would not really
need to be a SQL database, but it would need a bit more functionality
than the current PersistenceManager.

Anyway, just an idea.
Thomas

On 6/18/07, Marcel Reutegger <ma...@gmx.net> wrote:
> Thomas Mueller wrote:
> > And after a crash, you can re-index everything, then things are
> > consistent and no data is lost (only time is lost). So solving the
> > transactional problem of the index is not highest priority. Still it
> > would be nice to have a clean solution.
>
> I agree.
>
> IMO the index should be held in the workspace together with the
> content. this would require some kind of pre-commit hook that allows the query
> handler to change some additional content (new index segment, etc.).
>
> regards
>   marcel
>

Re: Next Generation Persistence

Posted by Marcel Reutegger <ma...@gmx.net>.

Thomas Mueller wrote:
> And after a crash, you can re-index everything, then things are
> consistent and no data is lost (only time is lost). So solving the
> transactional problem of the index is not highest priority. Still it
> would be nice to have a clean solution.

I agree.

IMO the index should be held in the workspace together with the
content. this would require some kind of pre-commit hook that allows the query 
handler to change some additional content (new index segment, etc.).

regards
  marcel

Re: Next Generation Persistence

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 6/15/07, Thomas Mueller <th...@gmail.com> wrote:
> And after a crash, you can re-index everything, then things are
> consistent and no data is lost (only time is lost). So solving the
> transactional problem of the index is not highest priority. Still it
> would be nice to have a clean solution.

Related to NGP, I had this wishful thought that we could perhaps
integrate Lucene segment files with the NGP revision model, so that
the index update would actually become a part of a persisted
revision...

I guess that would require a whole new level of integration between
Jackrabbit and Lucene, but it would be nice to see what we could
achieve in case there's interest within both projects.

BR,

Jukka Zitting

Re: Next Generation Persistence

Posted by Thomas Mueller <th...@gmail.com>.

Hi,

Thanks for the link!

And after a crash, you can re-index everything, then things are
consistent and no data is lost (only time is lost). So solving the
transactional problem of the index is not highest priority. Still it
would be nice to have a clean solution.

Thomas

On 6/15/07, Marcel Reutegger <ma...@day.com> wrote:
> Thomas Mueller wrote:
> > I didn't find an open issue for: The search index is updated outside
> > of transactions. This doesn't feel right (I like consistency), but in
> > practice this is not a problem as long as all saved objects are in the
> > index: the query engine filters non-existing results. Is this correct?
>
> yes, this is correct.
>
> there is a jira issue that mentions this problem:
> http://issues.apache.org/jira/browse/JCR-204
>
> regards
>   marcel
>

Re: Next Generation Persistence

Posted by Marcel Reutegger <ma...@day.com>.

Thomas Mueller wrote:
> I didn't find an open issue for: The search index is updated outside
> of transactions. This doesn't feel right (I like consistency), but in
> practice this is not a problem as long as all saved objects are in the
> index: the query engine filters non-existing results. Is this correct?

yes, this is correct.

there is a jira issue that mentions this problem: 
http://issues.apache.org/jira/browse/JCR-204

regards
  marcel

Re: Next Generation Persistence

Posted by Bertrand Delacretaz <bd...@apache.org>.

On 6/15/07, Thomas Mueller <th...@gmail.com> wrote:

> ...Lets 90% of the storage space used in a repository are large objects
> (binary data). With GlobalDataStore, cloning (and versioning) should
> get 9 times faster (and use much less storage space)....

You're right, in the case that you mention. Looks like some of us will
need to buy some <favorite-beverage-of-your-choice> for whoever
implements this ;-)

> ...On the other hand, it is better to copy short strings, instead of
> saving them only once and keeping a reference, because of the
> overhead, and because the potential saving is small. An exception is
> cloning whole (node) subtrees. Here, copy-on-write would help....

Agreed, some clever decisions might need to be made depending on
what's being cloned.

-Bertrand

Re: Next Generation Persistence

Posted by Thomas Mueller <th...@gmail.com>.

Hi,

> Making the repository more "Subversion-like" by implementing some form
> of cheap/fast workspace cloning, and cheap/fast tagging of versions,
> might help a lot for some types of applications.

Lets 90% of the storage space used in a repository are large objects
(binary data). With GlobalDataStore, cloning (and versioning) should
get 9 times faster (and use much less storage space).

On the other hand, it is better to copy short strings, instead of
saving them only once and keeping a reference, because of the
overhead, and because the potential saving is small. An exception is
cloning whole (node) subtrees. Here, copy-on-write would help.

Thomas

Re: Next Generation Persistence

Posted by Bertrand Delacretaz <bd...@apache.org>.

On 6/14/07, Thomas Mueller <th...@gmail.com> wrote:

> ...Are there any other issues to consider (and issues
> without JIRA entry)?...

Making the repository more "Subversion-like" by implementing some form
of cheap/fast workspace cloning, and cheap/fast tagging of versions,
might help a lot for some types of applications.

Not sure how much these features are related to the persistence layer
though, they probably have impact on other parts of the code as well.

-Bertrand

Re: Next Generation Persistence

Posted by Thomas Mueller <th...@gmail.com>.

Hi,

I like to better understand the reasons for NGP. I found the following
issues in JIRA, but I think most of those problems can be solved even
without NGP. Are there any other issues to consider (and issues
without JIRA entry)?

http://issues.apache.org/jira/browse/JCR-314
Allow concurrent writes on the PM. The root problem seems to be:
storing large binary objects blocks others?

http://issues.apache.org/jira/browse/JCR-926
Global data store for binaries (stream large objects early without
blocking others)

http://issues.apache.org/jira/browse/JCR-926
Multiple connections problem / Versioning operations.
Could be solved by using the same connection for versioning.

https://issues.apache.org/jira/browse/JCR-630
Versioning operations are not fully transactional.
Could be solved by using the same connection for versioning.

http://issues.apache.org/jira/browse/JCR-631
Change resources sequence during transaction commit.
Could be solved by using the same connection for versioning.

http://issues.apache.org/jira/browse/JCR-890
Concurrent read-only access to a session
Unrelated (multiple threads in one session, I would use synchronize)

http://issues.apache.org/jira/browse/JCR-851
Handling of binary properties (streams) in QValue interface: unrelated
to this discussion, SPI specific

I didn't find an open issue for: The search index is updated outside
of transactions. This doesn't feel right (I like consistency), but in
practice this is not a problem as long as all saved objects are in the
index: the query engine filters non-existing results. Is this correct?

What do you think about using the same connection for versioning and
regular access? I know it requires refactoring, and a new setting in
repository.xml. Anything else?

I found some more information about MVCC. It looks like PostgreSQL,
Oracle, and newer versions of MS-SQL Server work like this:

- Reading: read the 'base revision of the session' (writers don't block readers)
- Writing: lock the node for other writers, creates a new 'version'

Using write locks avoids the following problem:

- Session A starts a transaction, updates Node 1 (x=4)
- Session B starts a transaction, updates Node 1 (x=5), commits (saves)
- Session A does some more work, tries to commit -> Exception

Theoretically, session A should catch the exception and retry. But
many applications expect it to work (it works now). Also, retrying
will not work if the transaction is long and Node 1 is updated a lot
by other sessions (let's say it a counter). That's why I would use
locks for writes. MVCC is used for reading, so readers don't block
writers (like they do now?), resulting in good concurrency for most
situations.

Explicit write locks: Sometimes an application doesn't need to update
a node but wants to ensure it's not updated by somebody else. This
feature is not that important; in databases, this is SELECT ... FOR
UPDATE, and most people don't really need it. This case is not
documented in the JCR API specs, but Jackrabbit could add a write lock
when calling Item.save() (even when no changes are made).

Thomas

P.S. If somebody wants to cross-post it to Lucene and Derby, feel
free. I think the requirements of Lucene and Derby are different, but
I might be wrong.

Re: Next Generation Persistence

Posted by Bertrand Delacretaz <bd...@apache.org>.

On 6/11/07, Jukka Zitting <ju...@gmail.com> wrote:

> ....Do you have some good forum in mind for initiating such discussions,
> or should we just start pinging other projects to gauge interest?...

I didn't have any particular forum in mind...maybe some cross-posted
discussion between Lucene, Derby and Jackrabbit would be a good start?
Getting people from these three projects together might be very
powerful...

And a project at labs.apache.org could serve as a playground for ideas.

-Bertrand

Re: Next Generation Persistence

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 6/11/07, Bertrand Delacretaz <bd...@apache.org> wrote:
> On 6/11/07, Thomas Mueller <th...@gmail.com> wrote:
> > ...It sounds like MVCC (multi version concurrency control) in databases....
>
> > ...Of course we don't need to rebuild a database, but maybe reuse
> > some ideas...
>
> These snippets, and the similarities between Jackrabbit's storage and
> databases, make me think that collaborating with other projects might
> help...I'm no expert on database internals, but has someone compared
> Jackrabbit's persistence requirements with Derby's, or with those of
> another Java-based database?

I actually did discuss with some Derby people about such potential
co-operation during the last ApacheCon US in Austin, but nothing
really did come out of that so far.

On the other end of the spectrum there also seems to be some interest
within Lucene to bring their persistence model closer to a "database"
model. I think there might be some nice convergence (see [1] for more)
ahead and it would be way cool if we could achieve something like that
in cooperation with other projects.

[1] http://mail-archives.apache.org/mod_mbox/lucene-general/200705.mbox/%3c510143ac0705111204r62bda0b5v8b063033fba227b@mail.gmail.com%3e

> If the requirements match reasonably well, it might make sense to
> collaborate with others, and maybe even create reusable "Java
> persistence" components in a separate or sub-project?

Do you have some good forum in mind for initiating such discussions,
or should we just start pinging other projects to gauge interest?

BR,

Jukka Zitting

Re: Next Generation Persistence

Posted by Bertrand Delacretaz <bd...@apache.org>.

On 6/11/07, Thomas Mueller <th...@gmail.com> wrote:

> ...It sounds like MVCC (multi version concurrency control) in databases....

> ...Of course we don't need to rebuild a database, but maybe reuse some ideas...

These snippets, and the similarities between Jackrabbit's storage and
databases, make me think that collaborating with other projects might
help...I'm no expert on database internals, but has someone compared
Jackrabbit's persistence requirements with Derby's, or with those of
another Java-based database?

If the requirements match reasonably well, it might make sense to
collaborate with others, and maybe even create reusable "Java
persistence" components in a separate or sub-project?

Just my 2 swiss centimes...

-Bertrand