You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jackrabbit.apache.org by Peter Travas <pt...@gmail.com> on 2007/10/03 14:02:40 UTC

Is this issue resolved?

I've hit this page: http://blog.jinspired.com/?p=40 when reading about JSR
on http://www.infoq.com/articles/spring-modules-jcr


<cite>
" The following blog entries are all based on transactional analysis work I
recently performed for a customer that was having repository corruption
issues in the pre-production phase of a project based on JackRabbit.

Transaction Integrity Inspections
blog.jinspired.com/?p=37

More Transaction Integrity Inspections
blog.jinspired.com/?p=39

Concurrent Transactional Access
blog.jinspired.com/?p=40

regards, William
</cite>


Has this issue been already addressed? If so, in which version (bug id
reference)?
It sounds quite serious, I think that you should post a reply on this thread
because it sounds scary.

Regards,
P.

Re: Is this issue resolved?

Posted by William Louth <wi...@jinspired.com>.

Hi,

I really do not want to create such an bad impression of the work of the
team. There are transactional issues that need to addressed some are easier
and others more architectural. I think it really comes down to a trade-off
between ease of use (JCR is a nice storage agnostic API) and enterprise
'ilities such as reliability. Every software (and vendor) selection involves
risk management but the level of risk is something you must decided in terms
of operating environment. 

In a non-clustered environment the product work and was extremely ease to
setup and get a flying start. When it came to heavy workloads and
fault-tolerance in a clustered environment that is when we started having
doubts at least in the performance tests I was involved in. 

To be honest the customer has migrated over to the EXO platform but
everything is not so perfect there either. That said the clustering appears
must more reliable but I am worried about speed of rebuilding Lucene indexes
when a node in the cluster becomes disconnected as there is no event
log/journal like in JackRabbit. At this stage in the quality of JCR
implementations its like "picking your own poison".

regards,

William

Peter Travas wrote:
> 
> William,
> 
> Thanks for answering in this thread. Have you filled JIRA issues reporting
> problems encountered during your tests? Looks like you spent a lot of time
> on this. Can you, with your experience on the HA and transaction
> processing
> field, help with these issues?
> Your blog entry (and some comments about poor scalability on TSS thread
> provided by Jukka) are killing my overall good impression on JR and JCR.
> 
> I'm responsible for making decision whether JR in a go or no-go for my
> project and, to be honest, after this bad press I'm simply afraid of being
> hanged by my teammates in the near future, if such scalability problems
> would appear.
> 
> We had some _extremely_ bad experiences with another Apache project, so my
> team is a little bit touchy on this field :(:(:(:(.
> 
> 

-- 
View this message in context: http://www.nabble.com/Is-this-issue-resolved--tf4561381.html#a13106334
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

Re: Is this issue resolved?

Posted by Peter Travas <pt...@gmail.com>.

William,

Thanks for answering in this thread. Have you filled JIRA issues reporting
problems encountered during your tests? Looks like you spent a lot of time
on this. Can you, with your experience on the HA and transaction processing
field, help with these issues?
Your blog entry (and some comments about poor scalability on TSS thread
provided by Jukka) are killing my overall good impression on JR and JCR.

I'm responsible for making decision whether JR in a go or no-go for my
project and, to be honest, after this bad press I'm simply afraid of being
hanged by my teammates in the near future, if such scalability problems
would appear.

We had some _extremely_ bad experiences with another Apache project, so my
team is a little bit touchy on this field :(:(:(:(.

Re: Is this issue resolved?

Posted by William Louth <wi...@jinspired.com>.

Hi,

"CLUSTER_GLOBAL_REVISION SET revision = revision + 1"

Yes this was the case I mentioned were JXInsight had incorrectly classified
the transactional behavior but there I do believe that this was a mistake in
the blog entry and that on another table there was a similar update of a
sequence without the read within the statement itself. I believe we have
archived the snapshot so I should be able to come back with the actual
offending statement this week. 

I am not certain but I believe the issues were all reported with 1.2 up-to
(and maybe including) 1.3. I will check back with the customer.

regards,

William


Jukka Zitting wrote:
> 
> Hi,
> 
> On 10/9/07, William Louth <wi...@jinspired.com> wrote:
>> The tone reflects my frustration at having to go over this time and time
>> again. I might as well be talking to the wall when it comes to
>> Jackrabbit.
> 
> I'm sorry for that. Thanks anyway for following up on this, as your
> input is very valuable, even if we've failed to show that.
> 
>> I did indicate there was one issue which I subsequently deemed dubious
>> and
>> possibly an incorrect assessment by the tool but the all other issues
>> (4-5)
>> were valid. We can all make mistakes but to continue to imply an
>> assessment
>> by a person who has worked many years in building tools for transaction
>> processing analysis was incorrect is unacceptable and irresponsible to
>> potential customers.
> 
> I don't doubt your experience, in fact I'm quite certain you are far
> more experienced with transactions than I will ever be. See below for
> a more detailed review of the points you raised on your posts about
> Jackrabbit.
> 
> 1) http://blog.jinspired.com/?p=37
> 
> The first warning you discuss is about the SQL statement "UPDATE
> CLUSTER_GLOBAL_REVISION SET revision = revision + 1", where JXInsight
> warns that the UPDATE statement is executed before reading anything
> from the same table. However, the current value of the global revision
> counter is actually read in the right hand side of the SET expression,
> making previous SELECT statements unneeded.
> 
> The other warnings are about DELETE and UPDATE statements on the
> default_prop and default_node tables where no SELECTs on those tables
> have been made within the same transaction. This would likely be a
> problem in a typical database environment where there are multiple
> concurrent clients accessing and modifying the data, but in
> (non-clustered) Jackrabbit all database access for a workspace happens
> through a single connection in the a database persistence manager
> configured for that workspace. Thus there is no need for Jackrabbit to
> read the data _during the same transaction_ before deleting or
> updating it.
> 
> Now, and this may well be a big problem in Jackrabbit, in clustered
> mode Jackrabbit actually does allow multiple clients to concurrently
> access and modify the underlying database. There is a separate
> clustering journal that is used to synchronize the cluster nodes, but
> I'm not too certain that all the race conditions are properly
> accounted for. I'll branch a separate thread for the details.
> 
> 2) http://blog.jinspired.com/?p=39
> 
> This problem is about two threads using a prepared statement within
> the context of the same transaction. Looking deeper into the issue I
> think I've identified the cause in DatabaseJournal in Jackrabbit 1.2.x
> operating with autocommits disabled but without proper commit() calls
> to separate the transactions. There is no need for the operations in
> question to execute within the same transaction boundary, so in
> Jackrabbit 1.3.x the autocommit mode is enabled.
> 
> William, I must admit initially misjudging this blog entry based on
> looking only at the Jackrabbit 1.3 code (still the trunk at the time).
> I couldn't find any place where such cross-thread transaction access
> could occur, so I assumed (incorrectly extrapolating from the false
> positives on the first blog post) the tool to have just flagged the
> Jackrabbit practice of acquiring a single connection and preparing all
> required statements, and then using those statements from multiple
> (synchronized) threads.
> 
> 3) http://blog.jinspired.com/?p=40
> 
> This is another case of multiple threads operating using the same
> transaction, and I believe the root cause is the same incorrect
> autocommit setting in Jackrabbit 1.2.x.
> 
>> Also one look at a few tx related issues in the jira tracking systems
>> reveals fatal flaws in the transaction handling of jcr session level
>> operations.
> 
> You mean JCR-449 and JCR-566? These are both related to versioning,
> and due to architectural constraints in Jackrabbit (versioning store
> being referenced by multiple workspaces) we have trouble ensuring
> properly transactional (or even concurrent, at least up to Jackrabbit
> 1.3.3) access to the versioning features.
> 
> However, normal level 2 operations should work fine with transactions
> in Jackrabbit.
> 
> BR,
> 
> Jukka Zitting
> 
> 

-- 
View this message in context: http://www.nabble.com/Is-this-issue-resolved--tf4561381.html#a13106034
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

Re: Is this issue resolved?

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 10/9/07, William Louth <wi...@jinspired.com> wrote:
> I will send you some screen shots of the call stacks in question which should
> help you determine whether the problem has long gone.

OK, thanks!

BR,

Jukka Zitting

Re: Is this issue resolved?

Posted by William Louth <wi...@jinspired.com>.

I will send you some screen shots of the call stacks in question which should
help you determine whether the problem has long gone.

I had a quick scan of one snapshot and it the write-only access blog entry
should have referenced a SQL update statement on the FSENTRY where the
LASTMOD and LENGTH table columns are updated but not accessed in the WHERE
clause.

regards

William

-- 
View this message in context: http://www.nabble.com/Is-this-issue-resolved--tf4561381.html#a13106416
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

Re: Is this issue resolved?

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 10/9/07, William Louth <wi...@jinspired.com> wrote:
> I am actually more than certain that these were reported against 1.3 but with
> clustering enabled. Most of the problems were encountered when the customer
> moved to a clustered configuration.

OK, this is interesting. Do you think the customer problems were
related to the transactions-in-multiple-threads issue, or are there
other problems? Also, what database backend and JDBC driver were they
using?

I'll see if I can set up JXInsight with a clustered Jackrabbit
instance to get some more input.

BR,

Jukka Zitting

Re: Is this issue resolved?

Posted by William Louth <wi...@jinspired.com>.

I am actually more than certain that these were reported against 1.3 but with
clustering enabled. Most of the problems were encountered when the customer
moved to a clustered configuration.

regards,

William


Jukka Zitting wrote:
> 
> Hi,
> 
> On 10/9/07, William Louth <wi...@jinspired.com> wrote:
>> The tone reflects my frustration at having to go over this time and time
>> again. I might as well be talking to the wall when it comes to
>> Jackrabbit.
> 
> I'm sorry for that. Thanks anyway for following up on this, as your
> input is very valuable, even if we've failed to show that.
> 
>> I did indicate there was one issue which I subsequently deemed dubious
>> and
>> possibly an incorrect assessment by the tool but the all other issues
>> (4-5)
>> were valid. We can all make mistakes but to continue to imply an
>> assessment
>> by a person who has worked many years in building tools for transaction
>> processing analysis was incorrect is unacceptable and irresponsible to
>> potential customers.
> 
> I don't doubt your experience, in fact I'm quite certain you are far
> more experienced with transactions than I will ever be. See below for
> a more detailed review of the points you raised on your posts about
> Jackrabbit.
> 
> 1) http://blog.jinspired.com/?p=37
> 
> The first warning you discuss is about the SQL statement "UPDATE
> CLUSTER_GLOBAL_REVISION SET revision = revision + 1", where JXInsight
> warns that the UPDATE statement is executed before reading anything
> from the same table. However, the current value of the global revision
> counter is actually read in the right hand side of the SET expression,
> making previous SELECT statements unneeded.
> 
> The other warnings are about DELETE and UPDATE statements on the
> default_prop and default_node tables where no SELECTs on those tables
> have been made within the same transaction. This would likely be a
> problem in a typical database environment where there are multiple
> concurrent clients accessing and modifying the data, but in
> (non-clustered) Jackrabbit all database access for a workspace happens
> through a single connection in the a database persistence manager
> configured for that workspace. Thus there is no need for Jackrabbit to
> read the data _during the same transaction_ before deleting or
> updating it.
> 
> Now, and this may well be a big problem in Jackrabbit, in clustered
> mode Jackrabbit actually does allow multiple clients to concurrently
> access and modify the underlying database. There is a separate
> clustering journal that is used to synchronize the cluster nodes, but
> I'm not too certain that all the race conditions are properly
> accounted for. I'll branch a separate thread for the details.
> 
> 2) http://blog.jinspired.com/?p=39
> 
> This problem is about two threads using a prepared statement within
> the context of the same transaction. Looking deeper into the issue I
> think I've identified the cause in DatabaseJournal in Jackrabbit 1.2.x
> operating with autocommits disabled but without proper commit() calls
> to separate the transactions. There is no need for the operations in
> question to execute within the same transaction boundary, so in
> Jackrabbit 1.3.x the autocommit mode is enabled.
> 
> William, I must admit initially misjudging this blog entry based on
> looking only at the Jackrabbit 1.3 code (still the trunk at the time).
> I couldn't find any place where such cross-thread transaction access
> could occur, so I assumed (incorrectly extrapolating from the false
> positives on the first blog post) the tool to have just flagged the
> Jackrabbit practice of acquiring a single connection and preparing all
> required statements, and then using those statements from multiple
> (synchronized) threads.
> 
> 3) http://blog.jinspired.com/?p=40
> 
> This is another case of multiple threads operating using the same
> transaction, and I believe the root cause is the same incorrect
> autocommit setting in Jackrabbit 1.2.x.
> 
>> Also one look at a few tx related issues in the jira tracking systems
>> reveals fatal flaws in the transaction handling of jcr session level
>> operations.
> 
> You mean JCR-449 and JCR-566? These are both related to versioning,
> and due to architectural constraints in Jackrabbit (versioning store
> being referenced by multiple workspaces) we have trouble ensuring
> properly transactional (or even concurrent, at least up to Jackrabbit
> 1.3.3) access to the versioning features.
> 
> However, normal level 2 operations should work fine with transactions
> in Jackrabbit.
> 
> BR,
> 
> Jukka Zitting
> 
> 

-- 
View this message in context: http://www.nabble.com/Is-this-issue-resolved--tf4561381.html#a13106037
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

Re: Is this issue resolved?

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 10/9/07, Jukka Zitting <ju...@gmail.com> wrote:
> Now, and this may well be a big problem in Jackrabbit, in clustered
> mode Jackrabbit actually does allow multiple clients to concurrently
> access and modify the underlying database. There is a separate
> clustering journal that is used to synchronize the cluster nodes, but
> I'm not too certain that all the race conditions are properly
> accounted for. I'll branch a separate thread for the details.

No worries here. As pointed out by Dominique (our clustering expert),
Jackrabbit will explicitly synchronize the cluster nodes over all
content updates.

BR,

Jukka Zitting

Re: Is this issue resolved?

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 10/9/07, William Louth <wi...@jinspired.com> wrote:
> The tone reflects my frustration at having to go over this time and time
> again. I might as well be talking to the wall when it comes to Jackrabbit.

I'm sorry for that. Thanks anyway for following up on this, as your
input is very valuable, even if we've failed to show that.

> I did indicate there was one issue which I subsequently deemed dubious and
> possibly an incorrect assessment by the tool but the all other issues (4-5)
> were valid. We can all make mistakes but to continue to imply an assessment
> by a person who has worked many years in building tools for transaction
> processing analysis was incorrect is unacceptable and irresponsible to
> potential customers.

I don't doubt your experience, in fact I'm quite certain you are far
more experienced with transactions than I will ever be. See below for
a more detailed review of the points you raised on your posts about
Jackrabbit.

1) http://blog.jinspired.com/?p=37

The first warning you discuss is about the SQL statement "UPDATE
CLUSTER_GLOBAL_REVISION SET revision = revision + 1", where JXInsight
warns that the UPDATE statement is executed before reading anything
from the same table. However, the current value of the global revision
counter is actually read in the right hand side of the SET expression,
making previous SELECT statements unneeded.

The other warnings are about DELETE and UPDATE statements on the
default_prop and default_node tables where no SELECTs on those tables
have been made within the same transaction. This would likely be a
problem in a typical database environment where there are multiple
concurrent clients accessing and modifying the data, but in
(non-clustered) Jackrabbit all database access for a workspace happens
through a single connection in the a database persistence manager
configured for that workspace. Thus there is no need for Jackrabbit to
read the data _during the same transaction_ before deleting or
updating it.

Now, and this may well be a big problem in Jackrabbit, in clustered
mode Jackrabbit actually does allow multiple clients to concurrently
access and modify the underlying database. There is a separate
clustering journal that is used to synchronize the cluster nodes, but
I'm not too certain that all the race conditions are properly
accounted for. I'll branch a separate thread for the details.

2) http://blog.jinspired.com/?p=39

This problem is about two threads using a prepared statement within
the context of the same transaction. Looking deeper into the issue I
think I've identified the cause in DatabaseJournal in Jackrabbit 1.2.x
operating with autocommits disabled but without proper commit() calls
to separate the transactions. There is no need for the operations in
question to execute within the same transaction boundary, so in
Jackrabbit 1.3.x the autocommit mode is enabled.

William, I must admit initially misjudging this blog entry based on
looking only at the Jackrabbit 1.3 code (still the trunk at the time).
I couldn't find any place where such cross-thread transaction access
could occur, so I assumed (incorrectly extrapolating from the false
positives on the first blog post) the tool to have just flagged the
Jackrabbit practice of acquiring a single connection and preparing all
required statements, and then using those statements from multiple
(synchronized) threads.

3) http://blog.jinspired.com/?p=40

This is another case of multiple threads operating using the same
transaction, and I believe the root cause is the same incorrect
autocommit setting in Jackrabbit 1.2.x.

> Also one look at a few tx related issues in the jira tracking systems
> reveals fatal flaws in the transaction handling of jcr session level
> operations.

You mean JCR-449 and JCR-566? These are both related to versioning,
and due to architectural constraints in Jackrabbit (versioning store
being referenced by multiple workspaces) we have trouble ensuring
properly transactional (or even concurrent, at least up to Jackrabbit
1.3.3) access to the versioning features.

However, normal level 2 operations should work fine with transactions
in Jackrabbit.

BR,

Jukka Zitting

Re: Is this issue resolved?

Posted by William Louth <wi...@jinspired.com>.

Hi,

The tone reflects my frustration at having to go over this time and time
again. I might as well be talking to the wall when it comes to Jackrabbit.

Our analysis was and is correct and to-date nothing has been put forward to
show it to not be the case and yet people continue to state the case to be
different with little evidence other than wishful thinking or worse
ignorance.

I did indicate there was one issue which I subsequently deemed dubious and
possibly an incorrect assessment by the tool but the all other issues (4-5)
were valid. We can all make mistakes but to continue to imply an assessment
by a person who has worked many years in building tools for transaction
processing analysis was incorrect is unacceptable and irresponsible to
potential customers.

I internally tested our tool again and again before reporting the issues
which by the way had already occurred (but not classified) at the customer
site before JXInsight was brought into the picture.

Also one look at a few tx related issues in the jira tracking systems
reveals fatal flaws in the transaction handling of jcr session level
operations.

regards,

William
-- 
View this message in context: http://www.nabble.com/Is-this-issue-resolved--tf4561381.html#a13104910
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

Re: Is this issue resolved?

Posted by Padraic Hannon <pi...@wasabicowboy.com>.

There isn't much of a reason to be so condescending. I agree that there 
are some serious issues in the design of the jackrabbit core 
infrastructure and a number of people have chimed in with similar 
thoughts. The best way forward is to provide constructive feedback and 
even patches for issues. I have found that so far the community is very 
welcoming of such things. The JXInsight tool, which I have experimented 
with in the past, is a great tool and can be used to detect nasty bits 
of code. However, I am unsure why the harsh tone (something I have seen 
on TSS and other forums, and have so far avoided here), again the best 
way to make this project successful is if we take hard looks at the code 
and work forward towards resolution. The JCR spec is pretty nice to work 
against and it is within all our best interests to ensure that there is 
a healthy successful community supporting it.

aloha
paddy

Re: Is this issue resolved?

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 10/8/07, William Louth <wi...@jinspired.com> wrote:
> For your information these were a real world customer (not anymore) that
> performed some standard benchmarking with a high enough load and had
> corruption problems both at the database and in memory data structures.

We'd be happy to hear more about the problems you encountered.

Trust me, we would very much like to sort out such problems, but
without specific details on when (Jackrabbit version), where
(environment, configuration), and how (client code, logs, exception
stack traces) the problem occurs it is very difficult for us to track
down and fix such issues.

> http://jira.jboss.com/jira/browse/JBPORTAL-1187

:-) Now that's an example of a helpful bug report.

BR,

Jukka Zitting

Re: Is this issue resolved?

Posted by William Louth <wi...@jinspired.com>.

Jukka,

Your level of understanding of transaction processing is truly remarkable. 

If you actually read the blog entry (after taking the blindfolds off your
eyes) you would see that in fact in this case reported, JackRabbit did the
unthinkable of using the same connection across threads while the connection
was already active performing a transaction (autocommit=false) on behalf of
another thread. This is not a JXInsight/Java codified standard this is an
industry standard in transaction processing. 

By the way there are many other entries on the blog that show even
concurrency at the statement level across threads with results be closing
inadvertently by other threads.

For your information these were a real world customer (not anymore) that
performed some standard benchmarking with a high enough load and had
corruption problems both at the database and in memory data structures.

http://jira.jboss.com/jira/browse/JBPORTAL-1187

regards,

William

Jukka Zitting wrote:
> 
> Hi,
> 
> On 10/3/07, Peter Travas <pt...@gmail.com> wrote:
>> I've hit this page: http://blog.jinspired.com/?p=40 when reading about
>> JSR
>> on http://www.infoq.com/articles/spring-modules-jcr
>> [...]
>> Has this issue been already addressed? If so, in which version (bug id
>> reference)?
>> It sounds quite serious, I think that you should post a reply on this
>> thread
>> because it sounds scary.
> 
> There was some discussion about that on TSS some time ago, see the
> comments on http://www.theserverside.com/news/thread.tss?thread_id=45763.
> 
> It seems that most of the issues raised by JXInsight were just cases
> where the Jackrabbit code actually works correctly but doesn't follow
> the JDBC/J2EE practices as codified in JXInsight. For example one of
> the central practices that JXInsight seems to worry about is not using
> a single JDBC connection from multiple different threads. This would
> most likely be a big problem if the threads were accessing the
> connection concurrently, but in Jackrabbit's case such access is
> normally explicitly synchronized - a crucial fact that the JXInsight
> analysis doesn't notice.
> 
> So far we've yet to see a real-world bug report related to the issues
> raised in the blog entries you referred to, so I don't consider them
> too serious even if these are real problems. I'd certainly welcome
> someone to dig deeper to the results. There seems to be a free
> developer version of the JXInsight tool available, if someone is
> interested.
> 
> BR,
> 
> Jukka Zitting
> 
> 

-- 
View this message in context: http://www.nabble.com/Is-this-issue-resolved--tf4561381.html#a13099553
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.

Re: Is this issue resolved?

Posted by Jukka Zitting <ju...@gmail.com>.

Hi,

On 10/3/07, Peter Travas <pt...@gmail.com> wrote:
> I've hit this page: http://blog.jinspired.com/?p=40 when reading about JSR
> on http://www.infoq.com/articles/spring-modules-jcr
> [...]
> Has this issue been already addressed? If so, in which version (bug id
> reference)?
> It sounds quite serious, I think that you should post a reply on this thread
> because it sounds scary.

There was some discussion about that on TSS some time ago, see the
comments on http://www.theserverside.com/news/thread.tss?thread_id=45763.

It seems that most of the issues raised by JXInsight were just cases
where the Jackrabbit code actually works correctly but doesn't follow
the JDBC/J2EE practices as codified in JXInsight. For example one of
the central practices that JXInsight seems to worry about is not using
a single JDBC connection from multiple different threads. This would
most likely be a big problem if the threads were accessing the
connection concurrently, but in Jackrabbit's case such access is
normally explicitly synchronized - a crucial fact that the JXInsight
analysis doesn't notice.

So far we've yet to see a real-world bug report related to the issues
raised in the blog entries you referred to, so I don't consider them
too serious even if these are real problems. I'd certainly welcome
someone to dig deeper to the results. There seems to be a free
developer version of the JXInsight tool available, if someone is
interested.

BR,

Jukka Zitting