You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Cris Daniluk <cr...@gmail.com> on 2007/08/02 15:13:12 UTC

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

I've been observing this thread, pondering, and feel the need to weigh in.

Marcel's point here is that the JTA implementation doesn't allow the
RDBMS transaction to participate in the XA. I can see a good argument
for this - after all, Jackrabbit maintains an effective journal and
not all RDBMS can participate in XA.

That said, at the truest definition of a transaction, does just
writing to the changelog truly constitute a guaranteed transaction?
What if the RDBMS cannot be written to due to an integrity violation?
I don't think the cohesion between the RDBMS and the Jackrabbit
implementation are so tight that it is fair to argue any inconsistency
would be similar to datafile corruption.

Also, as Marcel noted, a bundled persistence manager is going to
potentially write to more than one RDBMS or file system - blobs, for
example, are hashed out to the file system in the default bundle
manager. I believe these blobs bypass the changelog (like blob writing
in most systems). Therefore, while you cannot ever guarantee true
transactionality on the fs, I think Jackrabbit could come a bit closer
by allowing the RDBMS node write and fs blob write to participate in
the XA.

Thoughts?
> ---------- Forwarded message ----------
> From: "Dominique Pfister" <do...@day.com>
> To: dev@jackrabbit.apache.org
> Date: Wed, 25 Jul 2007 11:37:06 +0200
> Subject: Re: Questions about TX in Jackrabbit, JTA and Spec compliance
> Hi Marcel,
>
> On 7/24/07, Marcel May <ma...@consol.de> wrote:
> > Jackrabbit JCA basically wraps Jackrabbit Core, but still all the Core
> > PersistenceManager and FileSystem implementations
> > are used. These, as you mentioned as well, use and manager their own
> > JDBC connections and therefore can never be JTA/XA compliant:
> >
> > - JTA/XA requires using a (distributed) transaction manager
> > - Jackrabbit directly invokes setAutoCommit/commit/rollback without a
> > transaction manager (illegal in JTA/XA terms!)
> > - Jackrabbit  Workspace with a DB FileSystem and DB PersistenceManager
> > have two separate configured connections w/o a transaction manager.
> >
> > Example:
> > - If Jackrabbit rolls back a TX directly on a connection, the
> > distributed transaction will not know about this.
> > - If the distributed TX is rolled back, Jackrabbit might already have
> > invoked con.commit() ... therefore no
> >   rollback is possible.
>
> When using Jackrabbit JCA, every repository operation made on behalf
> of a distributed transaction is recorded in a "change log", something
> not associated with the JDBC connection used normally. This change log
> will not be persisted on invididual "save" calls, but only when the
> respective method calls on the XAResource interface, exposed by
> Jackrabbit JCA, are invoked. Therefore, I don't think the situations
> you describe are actually encountered.
>
> > Spec says a JCR impl can support TXs, and if it supports TXs it must
> > support JTA. Right?
>
> I'd say so.
>
> > The Jackrabbit impl. can not be transactional on workspace level if
> > internally a
> > database PersistenceManager and a databasse FileSystem each have their
> > own database connection:
> > An operation spawns the persistence manager (=pm) and the filesystem
> > (=fm), right?
> > If one part (fm/pm) succeeds  and is commited, the other part (fm/pm)
> > might fail and
> > therefore violate the ACID principle?
> > How do the two db connections of PM and FS work together?
> > This IMO can only be managed by  JTA/XA.
>
> AFAIK, the FS is mainly used for configuration purposes and therefore
> plays an important role on startup. PM, on the other side, is the one
> used when it comes to saving content in the repository. You're right,
> that a combination of PM/FS operations is conceivable where one side
> reports success and the other doesn't, but that shouldn't happen in
> real life.
>
> Again, when using Jackrabbit JCA, every operation that could
> potentially end up in a JDBC call writing some data, is rather logged
> to some internal storage and only executed when the distributed
> transaction is committed. It is only at that point in time, that all
> changes are written at one time using the PM's JDBC connection.
>
> Cheers
> Dominique
>
> > P.S.: I'd be willing to provide a documentation patch at the end of this discussion :-)
>
> Always happy to find some volunteers :-)
>
>

Anyone look at Carbonado

Posted by Porter Woodward <pw...@practicable.net>.

A friend of mine just linked me to this:

http://carbonado.sourceforge.net/

Interesting not quite Java Content Repository API - but interesting 
nonetheless.  Some highlights:

1>  Java 5 Required - it makes use of Annotations and Generics.
2>  Can be backed by JDBC relational datastore, or BDB
3>  Transactional Support

Almost seems like a hybrid JCR / Hibernate Persistence type of tool.

Since it's open source, I'm wondering if there is any value in looking 
at and comparing approaches used in it and Jackrabbit.

- Porter

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

Posted by Marcel May <ma...@consol.de>.

Cris Daniluk wrote:
>> The changelog is filled with the operations BEFORE the transaction is
>> committed, and its contents is part of the logical view, as far as
>> node traversal is concerned. In other words, before the transaction is
>> committed, you will be the only one seeing those changes, and after
>> commit, everyone will. However, if JR crashes before the changelog has
>> been saved to the RDBMS, the changelog will be lost, as it is
>> memory-based.
>>
>>     
>
> This is where our concern comes in. Based on your explanation,
> Jackrabbit is not honoring the JTA spec, nor the general ACID
> transaction principals (durability, notably). The fact that the
> committed transaction rolls into the logical view is great, but the
> fact that there is no flush to permanent storage is not.
>
> The JTA spec is bound to the X/Open DTP standard, available at
> http://www.opengroup.org/onlinepubs/009680699/toc.pdf
>
> I think the spec clearly sets the expectation for transaction
> permanence, and I believe that Jackrabbit clearly misses that, so
> while I think that the JTA support offered is valuable, it is not
> truly compliant--probably in a way that would be surprising to most
> JTA users.
>
>   
I'm no XA expert either but agree with Chris:

If Jackrabbit is not storing the ChangeLog in phase one (prepare),
the changes can be potentially lost before phase 2 (commit) succeeds.
Succeeding a phase one prepare guarantees that no changes are lost.
So, for phase one the ChangeLog must be persisted IMO (Jackrabbit would
resume after
whatever failure and finish the XA TX by executing the recored ChangeLog).

>>> If the XA includes Jackrabbit AND the RDBMS AND any other outside
>>> participants that may be relevant, it could not be rolled back without
>>> Jackrabbit knowing. I'm not sure I understand where Jackrabbit could
>>> be "left out of the loop" on a rollback?
>>>       
>> I just have some concerns about the flow of control: what JR is
>> supposed to do with its associated JDBC connection when a XA TX is
>> prepared, committed or rolled back. Do I get your point here: instead
>> of using a changelog, continuously write changes made by the client
>> via XA capable JDBC connection to the database, using the fact that
>> uncommitted changes are only visible to that user?
>>
>>     
>
> If the DBMS supports the two-phase transaction (I believe Postgres
> does), then you could just use a JTA-enabled version of the JDBC
> driver and register the DB transaction to the existing XA. Then, while
> you execute the SQL directly to the RDBMS, it would not be visible as
> it is not committed. When the global transaction is committed, the
> DBMS would receive the two-phase commit request(s) and do the right
> thing automatically.
>
> The only other option is to persist the changelog, effectively
> converting it into a journal. However, I think bringing the DBMS into
> the XA is probably the quickest way to solve this problem..
>
> - Cris
>   

This would be a nice solution I guess.

As a result of this discussion, should we open an JIRA issue for JR/JCA?
I think no issue exists for this?

Thanks alot, Chris and Dominique!
This discussion was very helpful for my Jackrabbit internal understanding.

Cheers,
Marcel

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

Posted by Cris Daniluk <cr...@gmail.com>.

> > I would recommend creating a TransactionalPersistenceManager interface
> > and implementing it where appropriate. If someone tries to register an
> > XA transaction while a non-transactional PM is in use, it would throw
> > an immediate UnsupportedOperationException.
> >
> > - Cris
> >
>
> This is a good argument and has at one point been taken into
> consideration while implementing JTA support in Jackrabbit. The
> problem about this approach turned out to be the following: the PM API
> does not have a notion of "identity" or "origin" of a change: it is
> meant to be a very simple interface that unconditionally and
> atomically stores nodes and properties into whatever format it likes.
> The PM again is getting called by the SharedItemStateManager who
> maintains the cache of all items that have neither been modified in
> the transient space nor saved but not yet committed. If handling
> transactions would be delegated to a transactional PM, not only this
> component, but also the SharedItemStateManager would need considerable
> rework.
>
> Dominique
>

This is an interesting point, and I guess it underscores the
complexity of transactions and the general reason that MySQL went for
10 years saying they sucked before they figured out how to properly
implement them :)

Clearly the level of effort described shows this getting away from a
1.x change (or 1.3.x, at least). It nonetheless seems most
appropriate, though.

What would be the most appropriate way to discuss something like this
in "code form"? For example, if I were to submit a fairly hefty patch
from a relatively recent stable-line build, would someone be able to
incorporate it, if appropriate? I'm not sure it even makes sense to
talk about in this fashion, since it is almost too large to handle in
a single patch, but I'm not sure how to go about moving forward..

- Cris

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

Posted by Dominique Pfister <do...@day.com>.

On 8/9/07, Cris Daniluk <cr...@gmail.com> wrote:
> After reviewing some of the code, I think the problem is quite simply
> that Jackrabbit does something inherently incorrect - it attempts to
> provide guaranteed transactionality for all persistence managers. I
> think the transactionality must be applied and enforced at the PM,
> with help from the core API as well. Not all PMs are transactional,
> nor are they required to be by the JCR. In other words, a provider
> could give the optional transactional support for some of its PMs but
> not all.
>
> I think the intent of that decision was for this very reason.
>
> I would recommend creating a TransactionalPersistenceManager interface
> and implementing it where appropriate. If someone tries to register an
> XA transaction while a non-transactional PM is in use, it would throw
> an immediate UnsupportedOperationException.
>
> - Cris
>

This is a good argument and has at one point been taken into
consideration while implementing JTA support in Jackrabbit. The
problem about this approach turned out to be the following: the PM API
does not have a notion of "identity" or "origin" of a change: it is
meant to be a very simple interface that unconditionally and
atomically stores nodes and properties into whatever format it likes.
The PM again is getting called by the SharedItemStateManager who
maintains the cache of all items that have neither been modified in
the transient space nor saved but not yet committed. If handling
transactions would be delegated to a transactional PM, not only this
component, but also the SharedItemStateManager would need considerable
rework.

Dominique

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

Posted by Padraic Hannon <pi...@wasabicowboy.com>.

I logged a similar issue: https://issues.apache.org/jira/browse/JCR-1050.

It seems to me that Jackrabbit needs to decide if it is a data store or 
if it leverages one. If it leverages one it should delegate the 
transactional handling to the data store. If it is one is, there is a 
lot of work to be done to ensure that transaction isolation levels are 
respected, synchronization should not be the default state of the 
repository.  Regardless of this question, it seems that there is a lot 
of work to be done to rework some of the persistence mechanisms to 
delegate transaction management to the container. I understand that 
there is some thought to run jackrabbit in an embedded mechanism, 
however, so far it seems fairly closely tied to running within (at 
least) a servlet container. I like the suggestion of creating different 
PMs for different cases. Further use of OSGi (as suggested by Sling) 
could create a nice plugin mechanism for leveraging different PMs 
(although the current xml based configuration files also address this).

-paddy

Cris Daniluk wrote:
>>> Hm, I doubt that. JR at its core stores content with calls to an
>>> abstract interface, namely the Persistence Manager. It is this
>>> component that may store nodes and properties inside a DBMS, via JDBC
>>> calls over a single connection it acquired on startup. Changing this
>>> to using a transaction-local JDBC connection looks more complex to me
>>> than persisting the change log.
>>>
>>>       
>> Based on that design, persisting the change log is probably the way to
>> go. If you're using a single connection to write back to the DBMS,
>> that log could easily become very lengthy. I think the performance
>> overhead of writing to disk and flushing it would be utterly
>> insignificant (synchronous db writing is far more expensive
>> performance-wise, yet JR still exhibits solid performance).
>>
>>     
>
> After reviewing some of the code, I think the problem is quite simply
> that Jackrabbit does something inherently incorrect - it attempts to
> provide guaranteed transactionality for all persistence managers. I
> think the transactionality must be applied and enforced at the PM,
> with help from the core API as well. Not all PMs are transactional,
> nor are they required to be by the JCR. In other words, a provider
> could give the optional transactional support for some of its PMs but
> not all.
>
> I think the intent of that decision was for this very reason.
>
> I would recommend creating a TransactionalPersistenceManager interface
> and implementing it where appropriate. If someone tries to register an
> XA transaction while a non-transactional PM is in use, it would throw
> an immediate UnsupportedOperationException.
>
> - Cris
>

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

Posted by Cris Daniluk <cr...@gmail.com>.

> > Hm, I doubt that. JR at its core stores content with calls to an
> > abstract interface, namely the Persistence Manager. It is this
> > component that may store nodes and properties inside a DBMS, via JDBC
> > calls over a single connection it acquired on startup. Changing this
> > to using a transaction-local JDBC connection looks more complex to me
> > than persisting the change log.
> >
>
> Based on that design, persisting the change log is probably the way to
> go. If you're using a single connection to write back to the DBMS,
> that log could easily become very lengthy. I think the performance
> overhead of writing to disk and flushing it would be utterly
> insignificant (synchronous db writing is far more expensive
> performance-wise, yet JR still exhibits solid performance).
>

After reviewing some of the code, I think the problem is quite simply
that Jackrabbit does something inherently incorrect - it attempts to
provide guaranteed transactionality for all persistence managers. I
think the transactionality must be applied and enforced at the PM,
with help from the core API as well. Not all PMs are transactional,
nor are they required to be by the JCR. In other words, a provider
could give the optional transactional support for some of its PMs but
not all.

I think the intent of that decision was for this very reason.

I would recommend creating a TransactionalPersistenceManager interface
and implementing it where appropriate. If someone tries to register an
XA transaction while a non-transactional PM is in use, it would throw
an immediate UnsupportedOperationException.

- Cris

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

Posted by Cris Daniluk <cr...@gmail.com>.

On 8/6/07, Dominique Pfister <do...@day.com> wrote:
> On 8/6/07, Cris Daniluk <cr...@gmail.com> wrote:
> >
> > If the DBMS supports the two-phase transaction (I believe Postgres
> > does), then you could just use a JTA-enabled version of the JDBC
> > driver and register the DB transaction to the existing XA.
>
> How would you do that, technically, if everything you have at hand is
> your own XAResource implementation? Am I missing some important
> information the TM is giving me about the current transaction and some
> way to dynamically register new resources?
>

Hmm. I'm not sure... I will need to look in more detail at the way JR does it.

> >
> > The only other option is to persist the changelog, effectively
> > converting it into a journal. However, I think bringing the DBMS into
> > the XA is probably the quickest way to solve this problem..
> >
>
> Hm, I doubt that. JR at its core stores content with calls to an
> abstract interface, namely the Persistence Manager. It is this
> component that may store nodes and properties inside a DBMS, via JDBC
> calls over a single connection it acquired on startup. Changing this
> to using a transaction-local JDBC connection looks more complex to me
> than persisting the change log.
>

Based on that design, persisting the change log is probably the way to
go. If you're using a single connection to write back to the DBMS,
that log could easily become very lengthy. I think the performance
overhead of writing to disk and flushing it would be utterly
insignificant (synchronous db writing is far more expensive
performance-wise, yet JR still exhibits solid performance).

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

Posted by Dominique Pfister <do...@day.com>.

On 8/6/07, Cris Daniluk <cr...@gmail.com> wrote:
>
> If the DBMS supports the two-phase transaction (I believe Postgres
> does), then you could just use a JTA-enabled version of the JDBC
> driver and register the DB transaction to the existing XA.

How would you do that, technically, if everything you have at hand is
your own XAResource implementation? Am I missing some important
information the TM is giving me about the current transaction and some
way to dynamically register new resources?

>
> The only other option is to persist the changelog, effectively
> converting it into a journal. However, I think bringing the DBMS into
> the XA is probably the quickest way to solve this problem..
>

Hm, I doubt that. JR at its core stores content with calls to an
abstract interface, namely the Persistence Manager. It is this
component that may store nodes and properties inside a DBMS, via JDBC
calls over a single connection it acquired on startup. Changing this
to using a transaction-local JDBC connection looks more complex to me
than persisting the change log.

Dominique

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

Posted by Cris Daniluk <cr...@gmail.com>.

>
> The changelog is filled with the operations BEFORE the transaction is
> committed, and its contents is part of the logical view, as far as
> node traversal is concerned. In other words, before the transaction is
> committed, you will be the only one seeing those changes, and after
> commit, everyone will. However, if JR crashes before the changelog has
> been saved to the RDBMS, the changelog will be lost, as it is
> memory-based.
>

This is where our concern comes in. Based on your explanation,
Jackrabbit is not honoring the JTA spec, nor the general ACID
transaction principals (durability, notably). The fact that the
committed transaction rolls into the logical view is great, but the
fact that there is no flush to permanent storage is not.

The JTA spec is bound to the X/Open DTP standard, available at
http://www.opengroup.org/onlinepubs/009680699/toc.pdf

I think the spec clearly sets the expectation for transaction
permanence, and I believe that Jackrabbit clearly misses that, so
while I think that the JTA support offered is valuable, it is not
truly compliant--probably in a way that would be surprising to most
JTA users.

> > If the XA includes Jackrabbit AND the RDBMS AND any other outside
> > participants that may be relevant, it could not be rolled back without
> > Jackrabbit knowing. I'm not sure I understand where Jackrabbit could
> > be "left out of the loop" on a rollback?
>
> I just have some concerns about the flow of control: what JR is
> supposed to do with its associated JDBC connection when a XA TX is
> prepared, committed or rolled back. Do I get your point here: instead
> of using a changelog, continuously write changes made by the client
> via XA capable JDBC connection to the database, using the fact that
> uncommitted changes are only visible to that user?
>

If the DBMS supports the two-phase transaction (I believe Postgres
does), then you could just use a JTA-enabled version of the JDBC
driver and register the DB transaction to the existing XA. Then, while
you execute the SQL directly to the RDBMS, it would not be visible as
it is not committed. When the global transaction is committed, the
DBMS would receive the two-phase commit request(s) and do the right
thing automatically.

The only other option is to persist the changelog, effectively
converting it into a journal. However, I think bringing the DBMS into
the XA is probably the quickest way to solve this problem..

- Cris

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

Posted by Dominique Pfister <do...@day.com>.

Hi Cris,

On 8/2/07, Cris Daniluk <cr...@gmail.com> wrote:
> In Oracle, a committed transaction means that it is in the redo log,
> but not necessarily written to the tablespace. However if you combine
> the tablespace+log, you are guaranteed to get a consistent
> point-in-time view of that transaction. Oracle could, and often does,
> have trouble writing from the log out to the tablespace (corruption,
> insufficient space, whatever), but there is no loss of data. You can
> further back up to that transaction and regardless of the location
> (tablespace or log) you are covered. I realize that this is a pretty
> crappy, simplified description of journaling, but it might help frame
> our discussion.
>
> My concern with Jackrabbit is whether the changelog is a true journal
> or a mere queue for the database. For example, once a transaction is
> committed and written to the changelog, but before it is written to
> the RDBMS, is it part of the "logical view"? In other words, if I
> query JR before the flush to the DB, will I see my newly committed
> data? If I crash before the RDBMS write happens and start up, am I
> safe?

The changelog is filled with the operations BEFORE the transaction is
committed, and its contents is part of the logical view, as far as
node traversal is concerned. In other words, before the transaction is
committed, you will be the only one seeing those changes, and after
commit, everyone will. However, if JR crashes before the changelog has
been saved to the RDBMS, the changelog will be lost, as it is
memory-based.

> If the XA includes Jackrabbit AND the RDBMS AND any other outside
> participants that may be relevant, it could not be rolled back without
> Jackrabbit knowing. I'm not sure I understand where Jackrabbit could
> be "left out of the loop" on a rollback?

I just have some concerns about the flow of control: what JR is
supposed to do with its associated JDBC connection when a XA TX is
prepared, committed or rolled back. Do I get your point here: instead
of using a changelog, continuously write changes made by the client
via XA capable JDBC connection to the database, using the fact that
uncommitted changes are only visible to that user?

Kind regards
Dominique

>
> Thanks for your responses thusfar!
>
> - Cris
>

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

Posted by Cris Daniluk <cr...@gmail.com>.

> Cris, thanks a lot for your comments: they helped me understand what
> Marcel's concerns are about the way Jackrabbit implements XA.
>

I think this is a good discussion to have in general. Marcel and I
both have some concerns, definitely - but I'm not yet sure they're
valid, as my points below show. Basically it just comes down to the
level of sophistication in Jackrabbit's journaling.

> > Marcel's point here is that the JTA implementation doesn't allow the
> > RDBMS transaction to participate in the XA. I can see a good argument
> > for this - after all, Jackrabbit maintains an effective journal and
> > not all RDBMS can participate in XA.
> >
> > That said, at the truest definition of a transaction, does just
> > writing to the changelog truly constitute a guaranteed transaction?
> > What if the RDBMS cannot be written to due to an integrity violation?
> > I don't think the cohesion between the RDBMS and the Jackrabbit
> > implementation are so tight that it is fair to argue any inconsistency
> > would be similar to datafile corruption.
>
> Where would that integrity violation come from? If you think of some
> clustered environment, and some other node in a clustered environment
> has made some modifications that changed the same items this node is
> trying to update, it will get informed about the staleness of its copy
> and throw. IMHO, looking at the very basic data model Jackrabbit uses
> and if we rule out other programs that tamper around with the data, I
> don't think this should happen.
>

In Oracle, a committed transaction means that it is in the redo log,
but not necessarily written to the tablespace. However if you combine
the tablespace+log, you are guaranteed to get a consistent
point-in-time view of that transaction. Oracle could, and often does,
have trouble writing from the log out to the tablespace (corruption,
insufficient space, whatever), but there is no loss of data. You can
further back up to that transaction and regardless of the location
(tablespace or log) you are covered. I realize that this is a pretty
crappy, simplified description of journaling, but it might help frame
our discussion.

My concern with Jackrabbit is whether the changelog is a true journal
or a mere queue for the database. For example, once a transaction is
committed and written to the changelog, but before it is written to
the RDBMS, is it part of the "logical view"? In other words, if I
query JR before the flush to the DB, will I see my newly committed
data? If I crash before the RDBMS write happens and start up, am I
safe?

>
> I'm not sure if I you get there: do you suggest that Jackrabbit, when
> used with XA, uses some DB connection that is itself part of the same
> XA transaction and managed by the transaction manager?

This is what I'm suggesting for discussion, though I'm not necessarily
at a point where I'm suggesting a change be made :)

> I could then
> imagine that some change made inside Jackrabbit to the database will
> later be revoked because another part of the XA transaction has
> failed, without Jackrabbit noticing it, which would leave to
> inconsistencies.
>

If the XA includes Jackrabbit AND the RDBMS AND any other outside
participants that may be relevant, it could not be rolled back without
Jackrabbit knowing. I'm not sure I understand where Jackrabbit could
be "left out of the loop" on a rollback?

Thanks for your responses thusfar!

- Cris

Re: Questions about TX in Jackrabbit, JTA and Spec compliance

Posted by Dominique Pfister <do...@day.com>.

Hi Cris,

On 8/2/07, Cris Daniluk <cr...@gmail.com> wrote:
> I've been observing this thread, pondering, and feel the need to weigh in.
>

Cris, thanks a lot for your comments: they helped me understand what
Marcel's concerns are about the way Jackrabbit implements XA.

> Marcel's point here is that the JTA implementation doesn't allow the
> RDBMS transaction to participate in the XA. I can see a good argument
> for this - after all, Jackrabbit maintains an effective journal and
> not all RDBMS can participate in XA.
>
> That said, at the truest definition of a transaction, does just
> writing to the changelog truly constitute a guaranteed transaction?
> What if the RDBMS cannot be written to due to an integrity violation?
> I don't think the cohesion between the RDBMS and the Jackrabbit
> implementation are so tight that it is fair to argue any inconsistency
> would be similar to datafile corruption.

Where would that integrity violation come from? If you think of some
clustered environment, and some other node in a clustered environment
has made some modifications that changed the same items this node is
trying to update, it will get informed about the staleness of its copy
and throw. IMHO, looking at the very basic data model Jackrabbit uses
and if we rule out other programs that tamper around with the data, I
don't think this should happen.

> Also, as Marcel noted, a bundled persistence manager is going to
> potentially write to more than one RDBMS or file system - blobs, for
> example, are hashed out to the file system in the default bundle
> manager. I believe these blobs bypass the changelog (like blob writing
> in most systems).

I'd say that blobs use some temporary storage until they are saved, in
other words data is not written to shared storage before the
transaction is actually committed. Please correct me, if I'm wrong.

>                              Therefore, while you cannot ever guarantee true
> transactionality on the fs, I think Jackrabbit could come a bit closer
> by allowing the RDBMS node write and fs blob write to participate in
> the XA.

I'm not sure if I you get there: do you suggest that Jackrabbit, when
used with XA, uses some DB connection that is itself part of the same
XA transaction and managed by the transaction manager? I could then
imagine that some change made inside Jackrabbit to the database will
later be revoked because another part of the XA transaction has
failed, without Jackrabbit noticing it, which would leave to
inconsistencies.

Kind regards
Dominique

>
> Thoughts?
> > ---------- Forwarded message ----------
> > From: "Dominique Pfister" <do...@day.com>
> > To: dev@jackrabbit.apache.org
> > Date: Wed, 25 Jul 2007 11:37:06 +0200
> > Subject: Re: Questions about TX in Jackrabbit, JTA and Spec compliance
> > Hi Marcel,
> >
> > On 7/24/07, Marcel May <ma...@consol.de> wrote:
> > > Jackrabbit JCA basically wraps Jackrabbit Core, but still all the Core
> > > PersistenceManager and FileSystem implementations
> > > are used. These, as you mentioned as well, use and manager their own
> > > JDBC connections and therefore can never be JTA/XA compliant:
> > >
> > > - JTA/XA requires using a (distributed) transaction manager
> > > - Jackrabbit directly invokes setAutoCommit/commit/rollback without a
> > > transaction manager (illegal in JTA/XA terms!)
> > > - Jackrabbit  Workspace with a DB FileSystem and DB PersistenceManager
> > > have two separate configured connections w/o a transaction manager.
> > >
> > > Example:
> > > - If Jackrabbit rolls back a TX directly on a connection, the
> > > distributed transaction will not know about this.
> > > - If the distributed TX is rolled back, Jackrabbit might already have
> > > invoked con.commit() ... therefore no
> > >   rollback is possible.
> >
> > When using Jackrabbit JCA, every repository operation made on behalf
> > of a distributed transaction is recorded in a "change log", something
> > not associated with the JDBC connection used normally. This change log
> > will not be persisted on invididual "save" calls, but only when the
> > respective method calls on the XAResource interface, exposed by
> > Jackrabbit JCA, are invoked. Therefore, I don't think the situations
> > you describe are actually encountered.
> >
> > > Spec says a JCR impl can support TXs, and if it supports TXs it must
> > > support JTA. Right?
> >
> > I'd say so.
> >
> > > The Jackrabbit impl. can not be transactional on workspace level if
> > > internally a
> > > database PersistenceManager and a databasse FileSystem each have their
> > > own database connection:
> > > An operation spawns the persistence manager (=pm) and the filesystem
> > > (=fm), right?
> > > If one part (fm/pm) succeeds  and is commited, the other part (fm/pm)
> > > might fail and
> > > therefore violate the ACID principle?
> > > How do the two db connections of PM and FS work together?
> > > This IMO can only be managed by  JTA/XA.
> >
> > AFAIK, the FS is mainly used for configuration purposes and therefore
> > plays an important role on startup. PM, on the other side, is the one
> > used when it comes to saving content in the repository. You're right,
> > that a combination of PM/FS operations is conceivable where one side
> > reports success and the other doesn't, but that shouldn't happen in
> > real life.
> >
> > Again, when using Jackrabbit JCA, every operation that could
> > potentially end up in a JDBC call writing some data, is rather logged
> > to some internal storage and only executed when the distributed
> > transaction is committed. It is only at that point in time, that all
> > changes are written at one time using the PM's JDBC connection.
> >
> > Cheers
> > Dominique
> >
> > > P.S.: I'd be willing to provide a documentation patch at the end of this discussion :-)
> >
> > Always happy to find some volunteers :-)
> >
> >
>