You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by Thomas Müller <th...@day.com> on 2010/02/24 19:34:01 UTC

[jr3] Synchronized sessions

Currently, Jackrabbit sessions are somewhat synchronized, but not
completely (for example it's still possible to concurrently read and
update data). There were some problems because of that, and probably
there still are.

I believe it's better to synchronize all methods in the session (on
the session object). This includes methods on nodes and properties and
so on. If this does turn out to be a performance problem, we can
remove synchronization where required (and where it can safely be
removed) or change the implementation (use immutable objects or safe
data structures).

This is more conservative, but I think the impact on performance will
be minimal. Of course performance is important, however I think data
consistency is more important than the possible gain of a few percents
of (read-) performance.

Regards,
Thomas

Re: [jr3] Synchronized sessions

Posted by Alexander Klimetschek <ak...@day.com>.
On Wed, Feb 24, 2010 at 19:34, Thomas Müller <th...@day.com> wrote:
> I believe it's better to synchronize all methods in the session (on
> the session object). This includes methods on nodes and properties and
> so on. If this does turn out to be a performance problem, we can
> remove synchronization where required (and where it can safely be
> removed) or change the implementation (use immutable objects or safe
> data structures).

+1

Regards,
Alex

-- 
Alexander Klimetschek
alexander.klimetschek@day.com

Re: [jr3] Synchronized sessions

Posted by Stefan Guggisberg <st...@gmail.com>.
On Wed, Feb 24, 2010 at 7:34 PM, Thomas Müller <th...@day.com> wrote:
> Currently, Jackrabbit sessions are somewhat synchronized, but not
> completely (for example it's still possible to concurrently read and
> update data). There were some problems because of that, and probably
> there still are.
>
> I believe it's better to synchronize all methods in the session (on
> the session object). This includes methods on nodes and properties and
> so on. If this does turn out to be a performance problem, we can
> remove synchronization where required (and where it can safely be
> removed) or change the implementation (use immutable objects or safe
> data structures).
>
> This is more conservative, but I think the impact on performance will
> be minimal. Of course performance is important, however I think data
> consistency is more important than the possible gain of a few percents
> of (read-) performance.

+1

cheers
stefan

>
> Regards,
> Thomas
>

Re: [jr3] Synchronized sessions

Posted by Marcel Reutegger <ma...@gmx.net>.
On Thu, Feb 25, 2010 at 16:03, Carsten Ziegeler <cz...@apache.org> wrote:
> Simply syncing everything on the session would decrease performance in
> these cases dramatically.

jackrabbit already has quite high level synchronization points when it
comes to reading. e.g. the ItemManager is synchronized, which is
called every time an item is accessed. moving that synchronization one
level up in the call stack doesn't make a difference IMO.

but I see this topic more about synchronization for writes. there we
already say that it's not supported. so people won't notice when
there's less concurrency but the repository stays consistent instead.

regards
 marcel

Re: [jr3] Synchronized sessions

Posted by Stefan Guggisberg <st...@gmail.com>.
On Thu, Feb 25, 2010 at 5:10 PM, Carsten Ziegeler <cz...@apache.org> wrote:
> Stefan Guggisberg wrote:
>> On Thu, Feb 25, 2010 at 4:03 PM, Carsten Ziegeler <cz...@apache.org> wrote:
>>> Most jcr apps I've seen often use a single session from several threads
>>> to read from this session. (I think I also read it somewhere that this
>>> is safe with jackrabbit, but I might be mistaken).
>>
>> that's an unsupported api usage. i don't see why would need to support this
>> in a future version.
> I didn't say if jackrabbit should support this in the future or not. :)
> I'm just stating the fact, that huge apps that are built on top of
> jackrabbit make use of this! For example see this thread
> (http://n4.nabble.com/session-pooling-td539577.html#a539577) where it is
> explicitly stated that it's safe to use jackrabbit sessions for multi
> threaded reading. And as I said I've seen such hints and usage a lot.

i can't see any official statement in this thread, only personal opinions
expressed.

AFAIK 'session thread safety' has never been an officially supported
jackrabbit feature.

>
> And if I'm not mistaken, with explictly introducing all these syncs you
> enable multi-threaded read *and* write access.

no, the goal is to minimize the risk of repository corruption.

cheers
stefan

>
>>
>>> Simply syncing everything on the session would decrease performance in
>>> these cases dramatically.
>>
>> is this just a wild guess or do you have figures that prove your claim?
> Now, with the suggested syncing you have a very high level sync point
> where only one client after the other can use the objects.
>
> Carsten
> --
> Carsten Ziegeler
> cziegeler@apache.org
>

Re: [jr3] Synchronized sessions

Posted by Carsten Ziegeler <cz...@apache.org>.
Stefan Guggisberg wrote:
> On Thu, Feb 25, 2010 at 4:03 PM, Carsten Ziegeler <cz...@apache.org> wrote:
>> Most jcr apps I've seen often use a single session from several threads
>> to read from this session. (I think I also read it somewhere that this
>> is safe with jackrabbit, but I might be mistaken).
> 
> that's an unsupported api usage. i don't see why would need to support this
> in a future version.
I didn't say if jackrabbit should support this in the future or not. :)
I'm just stating the fact, that huge apps that are built on top of
jackrabbit make use of this! For example see this thread
(http://n4.nabble.com/session-pooling-td539577.html#a539577) where it is
explicitly stated that it's safe to use jackrabbit sessions for multi
threaded reading. And as I said I've seen such hints and usage a lot.

And if I'm not mistaken, with explictly introducing all these syncs you
enable multi-threaded read *and* write access.

> 
>> Simply syncing everything on the session would decrease performance in
>> these cases dramatically.
> 
> is this just a wild guess or do you have figures that prove your claim?
Now, with the suggested syncing you have a very high level sync point
where only one client after the other can use the objects.

Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [jr3] Synchronized sessions

Posted by Thomas Müller <th...@day.com>.
Hi,

> Consider two or more threads reading different items at the same time:
> they all are chained one after the other.

Only if those threads use the same session.

> this is unsupported, yet you want to add synchronization to secure
> this unsupported case ...

When we are done it becomes a supported use case :-)

> I don't have an example off hand

Please let us know if you have one.

Regards,
Thomas

Re: [jr3] Synchronized sessions

Posted by Stefan Guggisberg <st...@gmail.com>.
On Thu, Feb 25, 2010 at 4:54 PM, Felix Meschberger <fm...@gmail.com> wrote:
> Hi,
>
> On 25.02.2010 16:34, Stefan Guggisberg wrote:
>> On Thu, Feb 25, 2010 at 4:03 PM, Carsten Ziegeler <cz...@apache.org> wrote:
>>> Most jcr apps I've seen often use a single session from several threads
>>> to read from this session. (I think I also read it somewhere that this
>>> is safe with jackrabbit, but I might be mistaken).
>>
>> that's an unsupported api usage. i don't see why would need to support this
>> in a future version.
>
> Ok, so you want to synchronize (almost) everything to actually support a
> use case which is not supported ? I don't get it ;-)

i'd like to minimize the risk of ending up with a corrupted repository.

>
>>
>>> Simply syncing everything on the session would decrease performance in
>>> these cases dramatically.
>>
>> is this just a wild guess or do you have figures that prove your claim?
>
> This is rather high level of synchronization. Consider two or more
> threads reading different items at the same time: they all are chained
> one after the other.

figures?

>
> Yes, this is unsupported, yet you want to add synchronization to secure
> this unsupported case ...
>
> Regards
> Felix
>
>>
>> cheers
>> stefan
>>
>>>
>>> Carsten
>>> --
>>> Carsten Ziegeler
>>> cziegeler@apache.org
>>>
>>
>

Re: [jr3] Synchronized sessions

Posted by Felix Meschberger <fm...@gmail.com>.
Hi,

On 25.02.2010 16:34, Stefan Guggisberg wrote:
> On Thu, Feb 25, 2010 at 4:03 PM, Carsten Ziegeler <cz...@apache.org> wrote:
>> Most jcr apps I've seen often use a single session from several threads
>> to read from this session. (I think I also read it somewhere that this
>> is safe with jackrabbit, but I might be mistaken).
> 
> that's an unsupported api usage. i don't see why would need to support this
> in a future version.

Ok, so you want to synchronize (almost) everything to actually support a
use case which is not supported ? I don't get it ;-)

> 
>> Simply syncing everything on the session would decrease performance in
>> these cases dramatically.
> 
> is this just a wild guess or do you have figures that prove your claim?

This is rather high level of synchronization. Consider two or more
threads reading different items at the same time: they all are chained
one after the other.

Yes, this is unsupported, yet you want to add synchronization to secure
this unsupported case ...

Regards
Felix

> 
> cheers
> stefan
> 
>>
>> Carsten
>> --
>> Carsten Ziegeler
>> cziegeler@apache.org
>>
> 

Re: [jr3] Synchronized sessions

Posted by Stefan Guggisberg <st...@gmail.com>.
On Thu, Feb 25, 2010 at 4:03 PM, Carsten Ziegeler <cz...@apache.org> wrote:
> Most jcr apps I've seen often use a single session from several threads
> to read from this session. (I think I also read it somewhere that this
> is safe with jackrabbit, but I might be mistaken).

that's an unsupported api usage. i don't see why would need to support this
in a future version.

> Simply syncing everything on the session would decrease performance in
> these cases dramatically.

is this just a wild guess or do you have figures that prove your claim?

cheers
stefan

>
> Carsten
> --
> Carsten Ziegeler
> cziegeler@apache.org
>

Re: [jr3] Synchronized sessions

Posted by Thomas Müller <th...@day.com>.
Hi,

>> "consistency". I don't know of a relational database that allows you
>> to violate referential integrity, unique key constraints, or check
>> constraints - simply by using the same connection in multiple threads.
> jcr repository should have some point to do the constraints check as
> well. Should fail the operation if conflict found.

The easiest way to achieve internal integrity is to synchronize on an
object. Synchronizing on the session object is much easier, and costs
much less, than

1) allowing to corrupt internal states,
2) but then somehow detect the corruption
3) and then trying to fix such problems later on.

> Performance is not major concern, it's the design.

For me, performance _is_ a major concern. But reliability is more important.

> Synchronisation should be limited and should be applied to low level where necessary

There is an overhead for each "synchronize". If you synchronize on a
very low level, the cost potentially higher than if you synchronize on
a higher level. Because you have to synchronize more. Please note
"scalability" doesn't apply in this context: if you want to do stuff
concurrently, then use multiple sessions.

> instead blindly on session for everything.

I don't suggest to synchronize "blindly". I suggest to synchronize
"with open eyes" :-)

> Sync on session level could increase the deadlock as well.

No, the opposite: if every method is synchronized on the same object,
it will decrease the probability of deadlocks.

Regards,
Thomas

Re: [jr3] Synchronized sessions

Posted by Guo Du <mr...@gmail.com>.
On Sat, Feb 27, 2010 at 8:47 AM, Thomas Müller <th...@day.com> wrote:
> "consistency". I don't know of a relational database that allows you
> to violate referential integrity, unique key constraints, or check
> constraints - simply by using the same connection in multiple threads.
jdbc server response for constraints check and will throw exception at the end.

jcr repository should have some point to do the constraints check as
well. Should fail the operation if conflict found.


> Jackrabbit did and does have such problems (nodes that point to
> non-existing parent nodes; nodes that point to non-existing child
> nodes). *Those* are the problems I want to solve.


> It should never be necessary to run a "consistency check" or
> "consistency fix". It should never be necessary to delete nodes
> because they are "corrupt". Nodes should never get corrupt.
Agree, it doesn't matter what developer do via jcr api, data should
never be corrupted in repository.

>> But not with synchronizing all methods.
> As I already wrote, if this does turn out to be a performance problem,
> we can remove synchronization where required.
Performance is not major concern, it's the design. Synchronisation
should be limited and should be applied to low level where necessary,
instead blindly on session for everything. . Sync on session level
could increase the deadlock as well.

Your response could well applied to the sync everything on seesion:)
Unfortunately, it's not that simple to find out whose program caused
the problem. Usually other people have to fix the problem than those
who caused them.

-Guo

Re: [jr3] Synchronized sessions

Posted by Thomas Müller <th...@day.com>.
Hi,

> jdbc connection is not thread safe. jcr session works similar way and I prefer follow the same pattern.

Me too. But there is a difference between "thread safety" and
"consistency". I don't know of a relational database that allows you
to violate referential integrity, unique key constraints, or check
constraints - simply by using the same connection in multiple threads.
See also http://en.wikipedia.org/wiki/ACID

Jackrabbit did and does have such problems (nodes that point to
non-existing parent nodes; nodes that point to non-existing child
nodes). *Those* are the problems I want to solve.

Jackrabbit shouldn't try to protect an application from storing the
"wrong" data. It can't. Application developers are responsible for
ensuring application level  consistency (this sentence stolen from
Wikipedia).

> To what avail?

It should never be necessary to run a "consistency check" or
"consistency fix". It should never be necessary to delete nodes
because they are "corrupt". Nodes should never get corrupt.

> programmers ... If they do not, it is their fault and they have to live with the consequences of their doing the wrong thing.

Unfortunately, it's not that simple to find out whose program caused
the problem. Usually other people have to fix the problem than those
who caused them.

> But not with synchronizing all methods.

As I already wrote, if this does turn out to be a performance problem,
we can remove synchronization where required.

Regards,
Thomas

Re: [jr3] Synchronized sessions

Posted by Guo Du <mr...@gmail.com>.
On Fri, Feb 26, 2010 at 6:11 PM, Jukka Zitting <ju...@gmail.com> wrote:
> All we're trying to achieve here is ensure internal consistency even
> when clients do something like the above (for whatever reason,
> intentional or not).

jdbc connection is not thread safe.

jcr session works similar way and I prefer follow the same pattern.

We should promote developer to do the right thing instead of
technically encourage them to do bad design. Shared session is only
useful for read access in some case, if it's related to write, should
not share the session.

-Guo

Re: [jr3] Synchronized sessions

Posted by Felix Meschberger <fm...@gmail.com>.
Hi,

On 26.02.2010 19:11, Jukka Zitting wrote:
> Hi,
> 
> On Fri, Feb 26, 2010 at 6:36 PM, Felix Meschberger <fm...@gmail.com> wrote:
>> Consider two threads T1 and T2 each modifying data from the same session:
>>
>>  T1 makes some modifications
>>  T2 makes some modifications
>>  T1 saves the session (incl. both T1's and T2's modifs)
>>  T2 makes some more modifications
>>  T2 decides to rollback
>>
>> At the end the content is inconsistent from the POV of T2 because some
>> modifications have been persistent and some haven't.
> 
> This has nothing to do with synchronizing session access. If T2 wants
> a separate transient space, it should use a separate session.
> 
> All we're trying to achieve here is ensure internal consistency even
> when clients do something like the above (for whatever reason,
> intentional or not).

To what avail ?

Quoting Thomas Müller:

> In some cases, incorrect usage leads to data corruption. I
> believe data corrupt is not acceptable, even if the user made a
> mistake.

Now, you say, actual data consistency is not the goal, but internal
concistency is. All end-users (not the ones doing the coding) really
care about is data consistency. They don't care for a distinciton of
internal and external consistency.

Plus: This is *not* about users like my grand-mother who never touched a
computer in her life. This is about programmers who must adhere to a
programming model and to API contracts. If they do not, it is their
fault and they have to live with the consequences of their doing the
wrong thing.

If a Session can do better when used concurrently, fine. But not with
synchronizing all methods.

E.g.: How about taking note of the current thread when the transient
space is first used by a thread making modifications. As soon as another
thread is trying to use the same transient space, an exception might be
thrown. This way the transient space is "owned" by a session until
refresh or commit.

This is IMHO super-simple, fast and safe.

The only thing to care about -- and find a solution -- is, that the
Session might effectively become read-only if a thread starts modifying
content and then abandons without commit or refresh.

Regards
Felix


Re: [jr3] Synchronized sessions

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Fri, Feb 26, 2010 at 6:36 PM, Felix Meschberger <fm...@gmail.com> wrote:
> Consider two threads T1 and T2 each modifying data from the same session:
>
>  T1 makes some modifications
>  T2 makes some modifications
>  T1 saves the session (incl. both T1's and T2's modifs)
>  T2 makes some more modifications
>  T2 decides to rollback
>
> At the end the content is inconsistent from the POV of T2 because some
> modifications have been persistent and some haven't.

This has nothing to do with synchronizing session access. If T2 wants
a separate transient space, it should use a separate session.

All we're trying to achieve here is ensure internal consistency even
when clients do something like the above (for whatever reason,
intentional or not).

BR,

Jukka Zitting

Re: [jr3] Synchronized sessions

Posted by Felix Meschberger <fm...@gmail.com>.
Hi,

On 25.02.2010 16:33, Thomas Müller wrote:
>> this creates a big potential for deadlocks
> 
> Could you provide an example on how such a deadlock could look like?

While I could not yet come up with such an example, I found another
problem involving data inconsistency.

Consider two threads T1 and T2 each modifying data from the same session:

  T1 makes some modifications
  T2 makes some modifications
  T1 saves the session (incl. both T1's and T2's modifs)
  T2 makes some more modifications
  T2 decides to rollback

At the end the content is inconsistent from the POV of T2 because some
modifications have been persistent and some haven't.

Thus, method synchronization gives a false impression of thread safety.

Regards
Felix

> 
>> just synchronizing all methods
>> So you also synchronize all Node/Item/Property methods
> 
> Some methods don't need to be synchronized, for example some getter
> methods such as Session.getRepository(), RangeIterator.getPosition()
> and getSize(). I'm not sure if Node.getProperty needs to be
> synchronized. The Value class is (almost) immutable so synchronization
> is not required here. But very likely Session.getNode(..) and
> Node.getNode() need to be synchronized because those potentially
> modify the cache.
> 
>> ensure that for a given Item x, the same Item instance is always returned from all getXXX methods ....
> 
> I'm not sure what you are referring to. Jackrabbit already does ensure
> the same node object is returned as far as I know, but for other
> reasons than synchronization.
> 
>> if people do the wrong things, well, fine, let them do ...
> 
> It's usually not those people that have to fix broken repositories.
> 
>> my veto
> 
> Let's see.
> 
>> Most jcr apps I've seen often use a single session from several threads to read from this session. (I think I also read it somewhere that this is safe with jackrabbit, but I might be mistaken).
> 
> I'm not sure if this is really safe. Maybe it is problematic if one
> thread uses the same session for updates.
> 
>> Simply syncing everything on the session would decrease performance in these cases dramatically.
> 
> Actually, I don't think that's the case.
> 
> Regards,
> Thomas
> 


Re: [jr3] Synchronized sessions

Posted by Andrey Adamovich <an...@yahoo.com>.
Hi!

+1 for all that Thomas said.

 Andrey 




________________________________
From: Thomas Müller <th...@day.com>
To: dev@jackrabbit.apache.org
Sent: Thu, 25 February, 2010 22:24:13
Subject: Re: [jr3] Synchronized sessions

Hi

> http://issues.apache.org/jira/browse/JCR-2443.

Unfortunately this bug doesn't have a test case. Also I didn't find a
thread dump that shows what the problem was exactly. I can't say what
was the problem there.

Observation is definitely an area where synchronization can
potentially lead to deadlocks. Maybe observation needs to use its own
session(s) so that it can't block. This is not a new issue however:
most writes are already synchronized (not all writes however).

I'm hesitant to change synchronization with the current
implementation: doing that would very likely lead to Java level
deadlocks. We need to make sure synchronization is always done on the
same level, and in the same order. With the current implementation,
that's challenging.

Of course performance and concurrency is very important. But the
current approach (mutable data structures, some writes are
synchronized) is quite dangerous. Instead, immutable data structures
should be used, at least for values and objects in the shared cache.
Everything else should be properly synchronized if mutable, or - if
that's too slow - the proper data structures should be used, for
example ConcurrentHashMap, CopyOnWriteArrayList, CopyOnWriteArraySet.

Regards,
Thomas



      

Re: [jr3] Synchronized sessions

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Feb 25, 2010 at 10:24 PM, Thomas Müller <th...@day.com> wrote:
>> http://issues.apache.org/jira/browse/JCR-2443.
>
> Unfortunately this bug doesn't have a test case. Also I didn't find a
> thread dump that shows what the problem was exactly. I can't say what
> was the problem there.

This problem was caused by observation delivery trying to synchronize
on the receiving session while that session was waiting for an
internal lock that the event source was still holding. I had a thread
dump and could reproduce this behaviour in our CQ5 product (search for
JCR-2443 in our issue tracker) where it occurred as a result of heavy
concurrent writing and observation, but writing a standalone test case
proved quite challenging.

The good thing about this is that without the synchronization we would
sooner or later have seen cases where the state of the internal map
got corrupted due to concurrent updates. Such problems would have been
much more difficult to troubleshoot than the deadlock that clearly
pointed to the incorrect locking order. I'm quite sure that some of
the hairier Jackrabbit issues we see reported may in fact be caused by
exactly such concurrent session access without proper synchronization.

BR,

Jukka Zitting

Re: [jr3] Synchronized sessions

Posted by Thomas Müller <th...@day.com>.
Hi

> http://issues.apache.org/jira/browse/JCR-2443.

Unfortunately this bug doesn't have a test case. Also I didn't find a
thread dump that shows what the problem was exactly. I can't say what
was the problem there.

Observation is definitely an area where synchronization can
potentially lead to deadlocks. Maybe observation needs to use its own
session(s) so that it can't block. This is not a new issue however:
most writes are already synchronized (not all writes however).

I'm hesitant to change synchronization with the current
implementation: doing that would very likely lead to Java level
deadlocks. We need to make sure synchronization is always done on the
same level, and in the same order. With the current implementation,
that's challenging.

Of course performance and concurrency is very important. But the
current approach (mutable data structures, some writes are
synchronized) is quite dangerous. Instead, immutable data structures
should be used, at least for values and objects in the shared cache.
Everything else should be properly synchronized if mutable, or - if
that's too slow - the proper data structures should be used, for
example ConcurrentHashMap, CopyOnWriteArrayList, CopyOnWriteArraySet.

Regards,
Thomas

Re: [jr3] Synchronized sessions

Posted by Bart van der Schans <b....@onehippo.com>.
On Thu, Feb 25, 2010 at 4:33 PM, Thomas Müller <th...@day.com> wrote:
> Hi,
>
>> this creates a big potential for deadlocks
>
> Could you provide an example on how such a deadlock could look like?
We did have a deadlock issue with observations which called
Session.getNamespaceURI() and Session.getNamespacePrefix() which were
synchronized on the session. See
http://issues.apache.org/jira/browse/JCR-2443.

Regards,
Bart

Re: [jr3] Synchronized sessions

Posted by Felix Meschberger <fm...@gmail.com>.
Hi,

On 25.02.2010 16:33, Thomas Müller wrote:
> Hi,
> 
>> this creates a big potential for deadlocks
> 
> Could you provide an example on how such a deadlock could look like?

I don't have an example off hand, and as Jukka said, it might well be,
that there might be none. Still the potential exists, especially if the
session is not fully synchronized...

> 
>> just synchronizing all methods
>> So you also synchronize all Node/Item/Property methods
> 
> Some methods don't need to be synchronized, for example some getter
> methods such as Session.getRepository(), RangeIterator.getPosition()
> and getSize(). I'm not sure if Node.getProperty needs to be
> synchronized. The Value class is (almost) immutable so synchronization
> is not required here. But very likely Session.getNode(..) and
> Node.getNode() need to be synchronized because those potentially
> modify the cache.
> 
>> ensure that for a given Item x, the same Item instance is always returned from all getXXX methods ....
> 
> I'm not sure what you are referring to. Jackrabbit already does ensure
> the same node object is returned as far as I know, but for other
> reasons than synchronization.
> 
>> if people do the wrong things, well, fine, let them do ...
> 
> It's usually not those people that have to fix broken repositories.
> 
>> my veto
> 
> Let's see.

;-)


Regards
Felix

> 
>> Most jcr apps I've seen often use a single session from several threads to read from this session. (I think I also read it somewhere that this is safe with jackrabbit, but I might be mistaken).
> 
> I'm not sure if this is really safe. Maybe it is problematic if one
> thread uses the same session for updates.
> 
>> Simply syncing everything on the session would decrease performance in these cases dramatically.
> 
> Actually, I don't think that's the case.
> 
> Regards,
> Thomas
> 


Re: [jr3] Synchronized sessions

Posted by Thomas Müller <th...@day.com>.
Hi,

> this creates a big potential for deadlocks

Could you provide an example on how such a deadlock could look like?

> just synchronizing all methods
> So you also synchronize all Node/Item/Property methods

Some methods don't need to be synchronized, for example some getter
methods such as Session.getRepository(), RangeIterator.getPosition()
and getSize(). I'm not sure if Node.getProperty needs to be
synchronized. The Value class is (almost) immutable so synchronization
is not required here. But very likely Session.getNode(..) and
Node.getNode() need to be synchronized because those potentially
modify the cache.

> ensure that for a given Item x, the same Item instance is always returned from all getXXX methods ....

I'm not sure what you are referring to. Jackrabbit already does ensure
the same node object is returned as far as I know, but for other
reasons than synchronization.

> if people do the wrong things, well, fine, let them do ...

It's usually not those people that have to fix broken repositories.

> my veto

Let's see.

> Most jcr apps I've seen often use a single session from several threads to read from this session. (I think I also read it somewhere that this is safe with jackrabbit, but I might be mistaken).

I'm not sure if this is really safe. Maybe it is problematic if one
thread uses the same session for updates.

> Simply syncing everything on the session would decrease performance in these cases dramatically.

Actually, I don't think that's the case.

Regards,
Thomas

Re: [jr3] Synchronized sessions

Posted by Carsten Ziegeler <cz...@apache.org>.
Most jcr apps I've seen often use a single session from several threads
to read from this session. (I think I also read it somewhere that this
is safe with jackrabbit, but I might be mistaken).
Simply syncing everything on the session would decrease performance in
these cases dramatically.

Carsten
-- 
Carsten Ziegeler
cziegeler@apache.org

Re: [jr3] Synchronized sessions

Posted by Felix Meschberger <fm...@gmail.com>.
Hi,

On 26.02.2010 14:38, Marcel Reutegger wrote:
> Hi,
> 
> On Thu, Feb 25, 2010 at 19:14, Felix Meschberger <fm...@gmail.com> wrote:
>> Hi,
>>
>> On 25.02.2010 17:55, Marcel Reutegger wrote:
>>> Hi,
>>>
>>> On Thu, Feb 25, 2010 at 15:49, Felix Meschberger <fm...@gmail.com> wrote:
>>>> Hi,
>>>>
>>>> On 24.02.2010 21:19, Thomas Müller wrote:
>>>>> Hi,
>>>>>
>>>>>> deadlocks
>>>>>
>>>>> I think it's relatively simple to synchronize all methods on the session.
>>>>
>>>> Yes, but this creates a big potential for deadlocks ...
>>>>
>>>>>
>>>>>> If we want to make sessions thread-safe, we should use proper
>>>>>> implementations.
>>>>>
>>>>> Yes, that's what I want to write: a proper implementation.
>>>>
>>>> I disagree that this would be a proper implementation.
>>>
>>> can you please elaborate what you think is a proper implementation in
>>> this context?
>>
>> Just off-the-top-of-my-head: Using a better read-mostly guarding locking
>> mechanism (i.e. readers don't block each other, writers need exclusive
>> access [still not entirely save]); not at the "global" method level, but
>> more intelligently guarding the shared data; not using the Session
>> object itself for locking
> 
> that's pretty much what we do currently. we have read-write locks in
> various places. the problem with that approach is that the sequence of
> lock acquisition is very important. such fine grained locking is very
> difficult to control and lead to various deadlock situation that were
> hard to analyse and fix. IMO we should get rid of as many of those
> locks as we can and synchronize on a more coarse grained level, which
> is easier to maintain and more predictable.

The problem is, that the code is very complicated, not to say convoluted
in places. Combined with mulitple locking mechanisms (JVM locks and
read-write locks) this creates much potential for deadlocks, right.

But to take the sledge hammer of synchronizing all methods out of the
box IMVHO is the wrong way to go.

Regards
Felix

> 
> regards
>  marcel
> 
>> Regards
>> Felix
>>
>>>
>>> regards
>>>  marcel
>>>
>>>>>
>>>>>> any concurrent use of the same session is unsupported.
>>>>>
>>>>> The disadvantage of this is that there is no way to enforce correct
>>>>> usage. In some cases, incorrect usage leads to data corruption. I
>>>>> believe data corrupt is not acceptable, even if the user made a
>>>>> mistake.
>>>>
>>>> Anything can go wrong -- and if people do the wrong things, well, fine,
>>>> let them do ...
>>>>
>>>> And I don't say, we should not make Session thread-safe. But if we set
>>>> out to do it, we should do it right. And just synchronizing all methods
>>>> is just not right.
>>>>
>>>> Regards
>>>> Felix
>>>>
>>>>
>>>
>>
>>
> 


Re: [jr3] Synchronized sessions

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi,

On Thu, Feb 25, 2010 at 19:14, Felix Meschberger <fm...@gmail.com> wrote:
> Hi,
>
> On 25.02.2010 17:55, Marcel Reutegger wrote:
>> Hi,
>>
>> On Thu, Feb 25, 2010 at 15:49, Felix Meschberger <fm...@gmail.com> wrote:
>>> Hi,
>>>
>>> On 24.02.2010 21:19, Thomas Müller wrote:
>>>> Hi,
>>>>
>>>>> deadlocks
>>>>
>>>> I think it's relatively simple to synchronize all methods on the session.
>>>
>>> Yes, but this creates a big potential for deadlocks ...
>>>
>>>>
>>>>> If we want to make sessions thread-safe, we should use proper
>>>>> implementations.
>>>>
>>>> Yes, that's what I want to write: a proper implementation.
>>>
>>> I disagree that this would be a proper implementation.
>>
>> can you please elaborate what you think is a proper implementation in
>> this context?
>
> Just off-the-top-of-my-head: Using a better read-mostly guarding locking
> mechanism (i.e. readers don't block each other, writers need exclusive
> access [still not entirely save]); not at the "global" method level, but
> more intelligently guarding the shared data; not using the Session
> object itself for locking

that's pretty much what we do currently. we have read-write locks in
various places. the problem with that approach is that the sequence of
lock acquisition is very important. such fine grained locking is very
difficult to control and lead to various deadlock situation that were
hard to analyse and fix. IMO we should get rid of as many of those
locks as we can and synchronize on a more coarse grained level, which
is easier to maintain and more predictable.

regards
 marcel

> Regards
> Felix
>
>>
>> regards
>>  marcel
>>
>>>>
>>>>> any concurrent use of the same session is unsupported.
>>>>
>>>> The disadvantage of this is that there is no way to enforce correct
>>>> usage. In some cases, incorrect usage leads to data corruption. I
>>>> believe data corrupt is not acceptable, even if the user made a
>>>> mistake.
>>>
>>> Anything can go wrong -- and if people do the wrong things, well, fine,
>>> let them do ...
>>>
>>> And I don't say, we should not make Session thread-safe. But if we set
>>> out to do it, we should do it right. And just synchronizing all methods
>>> is just not right.
>>>
>>> Regards
>>> Felix
>>>
>>>
>>
>
>

Re: [jr3] Synchronized sessions

Posted by Felix Meschberger <fm...@gmail.com>.
Hi,

On 25.02.2010 17:55, Marcel Reutegger wrote:
> Hi,
> 
> On Thu, Feb 25, 2010 at 15:49, Felix Meschberger <fm...@gmail.com> wrote:
>> Hi,
>>
>> On 24.02.2010 21:19, Thomas Müller wrote:
>>> Hi,
>>>
>>>> deadlocks
>>>
>>> I think it's relatively simple to synchronize all methods on the session.
>>
>> Yes, but this creates a big potential for deadlocks ...
>>
>>>
>>>> If we want to make sessions thread-safe, we should use proper
>>>> implementations.
>>>
>>> Yes, that's what I want to write: a proper implementation.
>>
>> I disagree that this would be a proper implementation.
> 
> can you please elaborate what you think is a proper implementation in
> this context?

Just off-the-top-of-my-head: Using a better read-mostly guarding locking
mechanism (i.e. readers don't block each other, writers need exclusive
access [still not entirely save]); not at the "global" method level, but
more intelligently guarding the shared data; not using the Session
object itself for locking

Regards
Felix

> 
> regards
>  marcel
> 
>>>
>>>> any concurrent use of the same session is unsupported.
>>>
>>> The disadvantage of this is that there is no way to enforce correct
>>> usage. In some cases, incorrect usage leads to data corruption. I
>>> believe data corrupt is not acceptable, even if the user made a
>>> mistake.
>>
>> Anything can go wrong -- and if people do the wrong things, well, fine,
>> let them do ...
>>
>> And I don't say, we should not make Session thread-safe. But if we set
>> out to do it, we should do it right. And just synchronizing all methods
>> is just not right.
>>
>> Regards
>> Felix
>>
>>
> 


Re: [jr3] Synchronized sessions

Posted by Marcel Reutegger <ma...@gmx.net>.
Hi,

On Thu, Feb 25, 2010 at 15:49, Felix Meschberger <fm...@gmail.com> wrote:
> Hi,
>
> On 24.02.2010 21:19, Thomas Müller wrote:
>> Hi,
>>
>>> deadlocks
>>
>> I think it's relatively simple to synchronize all methods on the session.
>
> Yes, but this creates a big potential for deadlocks ...
>
>>
>>> If we want to make sessions thread-safe, we should use proper
>>> implementations.
>>
>> Yes, that's what I want to write: a proper implementation.
>
> I disagree that this would be a proper implementation.

can you please elaborate what you think is a proper implementation in
this context?

regards
 marcel

>>
>>> any concurrent use of the same session is unsupported.
>>
>> The disadvantage of this is that there is no way to enforce correct
>> usage. In some cases, incorrect usage leads to data corruption. I
>> believe data corrupt is not acceptable, even if the user made a
>> mistake.
>
> Anything can go wrong -- and if people do the wrong things, well, fine,
> let them do ...
>
> And I don't say, we should not make Session thread-safe. But if we set
> out to do it, we should do it right. And just synchronizing all methods
> is just not right.
>
> Regards
> Felix
>
>

Re: [jr3] Synchronized sessions

Posted by Felix Meschberger <fm...@gmail.com>.
Hi,

On 24.02.2010 21:19, Thomas Müller wrote:
> Hi,
> 
>> deadlocks
> 
> I think it's relatively simple to synchronize all methods on the session.

Yes, but this creates a big potential for deadlocks ...

> 
>> If we want to make sessions thread-safe, we should use proper
>> implementations.
> 
> Yes, that's what I want to write: a proper implementation.

I disagree that this would be a proper implementation.

> 
>> any concurrent use of the same session is unsupported.
> 
> The disadvantage of this is that there is no way to enforce correct
> usage. In some cases, incorrect usage leads to data corruption. I
> believe data corrupt is not acceptable, even if the user made a
> mistake.

Anything can go wrong -- and if people do the wrong things, well, fine,
let them do ...

And I don't say, we should not make Session thread-safe. But if we set
out to do it, we should do it right. And just synchronizing all methods
is just not right.

Regards
Felix


Re: [jr3] Synchronized sessions

Posted by Thomas Müller <th...@day.com>.
Hi,

> deadlocks

I think it's relatively simple to synchronize all methods on the session.

> If we want to make sessions thread-safe, we should use proper
> implementations.

Yes, that's what I want to write: a proper implementation.

> any concurrent use of the same session is unsupported.

The disadvantage of this is that there is no way to enforce correct
usage. In some cases, incorrect usage leads to data corruption. I
believe data corrupt is not acceptable, even if the user made a
mistake.

Regards,
Thomas

Re: [jr3] Synchronized sessions

Posted by Stefan Guggisberg <st...@gmail.com>.
On Thu, Feb 25, 2010 at 3:59 PM, Felix Meschberger <fm...@gmail.com> wrote:
> Hi,
>
> On 25.02.2010 10:05, Jukka Zitting wrote:
>> Hi,
>>
>> On Thu, Feb 25, 2010 at 9:22 AM, Stefan Guggisberg
>> <st...@gmail.com> wrote:
>>> On Wed, Feb 24, 2010 at 9:03 PM, Felix Meschberger <fm...@gmail.com> wrote:
>>>> A big -1 (I already see the deadlocks in front of me)
>>>
>>> if there are deadlocks caused by such a change then the api's clearly being
>>> misused (i.e. concurrent use of session instances) and there's a risk of data
>>> corruption.
>>
>> In fact, even concurrent use of sessions couldn't trigger deadlocks in
>> this case, since for a deadlock you'd need something that already has
>> locked a deeper repository resource to try locking the session. The
>> only cases I can think of where such a reverse order of locking is
>> possible are observation listeners and item state change
>> notifications.
>
> So you also synchronize all Node/Item/Property methods and ensure that
> for a given Item x, the same Item instance is always returned from all
> getXXX methods ....

why? (almost) every method would just need to synchronize on the session
instance.

cheers
stefan

>
> The longer I think of it, the longer I am scared and the longer I am
> convinced, that this will be a situation, that has the potential of
> getting my veto....
>
> Regards
> Felix
>
>>
>> It's feasible that we still have some lurking concurrency issues in
>> those areas (there were a few that we already fixed, see JCR-2443 for
>> an example), but I prefer deadlocks (that give you easily traceable
>> thread dumps) to potential cases of corrupted internal state (that are
>> notoriously difficult to troubleshoot).
>>
>> So yeah, +1 to Thomas' suggestion. We shouldn't be relying on client
>> applications to maintain proper concurrency control.
>>
>> BR,
>>
>> Jukka Zitting
>>
>

Re: [jr3] Synchronized sessions

Posted by Felix Meschberger <fm...@gmail.com>.
Hi,

On 25.02.2010 10:05, Jukka Zitting wrote:
> Hi,
> 
> On Thu, Feb 25, 2010 at 9:22 AM, Stefan Guggisberg
> <st...@gmail.com> wrote:
>> On Wed, Feb 24, 2010 at 9:03 PM, Felix Meschberger <fm...@gmail.com> wrote:
>>> A big -1 (I already see the deadlocks in front of me)
>>
>> if there are deadlocks caused by such a change then the api's clearly being
>> misused (i.e. concurrent use of session instances) and there's a risk of data
>> corruption.
> 
> In fact, even concurrent use of sessions couldn't trigger deadlocks in
> this case, since for a deadlock you'd need something that already has
> locked a deeper repository resource to try locking the session. The
> only cases I can think of where such a reverse order of locking is
> possible are observation listeners and item state change
> notifications.

So you also synchronize all Node/Item/Property methods and ensure that
for a given Item x, the same Item instance is always returned from all
getXXX methods ....

The longer I think of it, the longer I am scared and the longer I am
convinced, that this will be a situation, that has the potential of
getting my veto....

Regards
Felix

> 
> It's feasible that we still have some lurking concurrency issues in
> those areas (there were a few that we already fixed, see JCR-2443 for
> an example), but I prefer deadlocks (that give you easily traceable
> thread dumps) to potential cases of corrupted internal state (that are
> notoriously difficult to troubleshoot).
> 
> So yeah, +1 to Thomas' suggestion. We shouldn't be relying on client
> applications to maintain proper concurrency control.
> 
> BR,
> 
> Jukka Zitting
> 

Re: [jr3] Synchronized sessions

Posted by Marcel Reutegger <ma...@gmx.net>.
On Thu, Feb 25, 2010 at 10:05, Jukka Zitting <ju...@gmail.com> wrote:
> Hi,
>
> On Thu, Feb 25, 2010 at 9:22 AM, Stefan Guggisberg
> <st...@gmail.com> wrote:
>> On Wed, Feb 24, 2010 at 9:03 PM, Felix Meschberger <fm...@gmail.com> wrote:
>>> A big -1 (I already see the deadlocks in front of me)
>>
>> if there are deadlocks caused by such a change then the api's clearly being
>> misused (i.e. concurrent use of session instances) and there's a risk of data
>> corruption.
>
> In fact, even concurrent use of sessions couldn't trigger deadlocks in
> this case, since for a deadlock you'd need something that already has
> locked a deeper repository resource to try locking the session. The
> only cases I can think of where such a reverse order of locking is
> possible are observation listeners and item state change
> notifications.
>
> It's feasible that we still have some lurking concurrency issues in
> those areas (there were a few that we already fixed, see JCR-2443 for
> an example), but I prefer deadlocks (that give you easily traceable
> thread dumps) to potential cases of corrupted internal state (that are
> notoriously difficult to troubleshoot).
>
> So yeah, +1 to Thomas' suggestion. We shouldn't be relying on client
> applications to maintain proper concurrency control.

+1

and I actually think we should do this already in 2.x. I've seen to
many concurrent use of sessions and data corruptions because of that,
even though we say it's not allowed. Fixing an inconsistent workspace
is a pain and finding the guilty code is usually very difficult.

regards
 marcel

> BR,
>
> Jukka Zitting
>

Re: [jr3] Synchronized sessions

Posted by Jukka Zitting <ju...@gmail.com>.
Hi,

On Thu, Feb 25, 2010 at 9:22 AM, Stefan Guggisberg
<st...@gmail.com> wrote:
> On Wed, Feb 24, 2010 at 9:03 PM, Felix Meschberger <fm...@gmail.com> wrote:
>> A big -1 (I already see the deadlocks in front of me)
>
> if there are deadlocks caused by such a change then the api's clearly being
> misused (i.e. concurrent use of session instances) and there's a risk of data
> corruption.

In fact, even concurrent use of sessions couldn't trigger deadlocks in
this case, since for a deadlock you'd need something that already has
locked a deeper repository resource to try locking the session. The
only cases I can think of where such a reverse order of locking is
possible are observation listeners and item state change
notifications.

It's feasible that we still have some lurking concurrency issues in
those areas (there were a few that we already fixed, see JCR-2443 for
an example), but I prefer deadlocks (that give you easily traceable
thread dumps) to potential cases of corrupted internal state (that are
notoriously difficult to troubleshoot).

So yeah, +1 to Thomas' suggestion. We shouldn't be relying on client
applications to maintain proper concurrency control.

BR,

Jukka Zitting

Re: [jr3] Synchronized sessions

Posted by Stefan Guggisberg <st...@gmail.com>.
On Wed, Feb 24, 2010 at 9:03 PM, Felix Meschberger <fm...@gmail.com> wrote:
> A big -1 (I already see the deadlocks in front of me)

if there are deadlocks caused by such a change then the api's clearly being
misused (i.e. concurrent use of session instances) and there's a risk of data
corruption.

it's IMO better to detect and fix such improper api usage.

cheers
stefan

>
> If we want to make sessions thread-safe, we should use proper
> implementations. On the other hand, it is probably really better to
> stick with the current setup of "a session is not thread-safe" and any
> concurrent use of the same session is unsupported.
>
> Regards
> Felix
>
> On 24.02.2010 18:34, Thomas Müller wrote:
>> Currently, Jackrabbit sessions are somewhat synchronized, but not
>> completely (for example it's still possible to concurrently read and
>> update data). There were some problems because of that, and probably
>> there still are.
>>
>> I believe it's better to synchronize all methods in the session (on
>> the session object). This includes methods on nodes and properties and
>> so on. If this does turn out to be a performance problem, we can
>> remove synchronization where required (and where it can safely be
>> removed) or change the implementation (use immutable objects or safe
>> data structures).
>>
>> This is more conservative, but I think the impact on performance will
>> be minimal. Of course performance is important, however I think data
>> consistency is more important than the possible gain of a few percents
>> of (read-) performance.
>>
>> Regards,
>> Thomas
>>
>
>

Re: [jr3] Synchronized sessions

Posted by Felix Meschberger <fm...@gmail.com>.
A big -1 (I already see the deadlocks in front of me)

If we want to make sessions thread-safe, we should use proper
implementations. On the other hand, it is probably really better to
stick with the current setup of "a session is not thread-safe" and any
concurrent use of the same session is unsupported.

Regards
Felix

On 24.02.2010 18:34, Thomas Müller wrote:
> Currently, Jackrabbit sessions are somewhat synchronized, but not
> completely (for example it's still possible to concurrently read and
> update data). There were some problems because of that, and probably
> there still are.
> 
> I believe it's better to synchronize all methods in the session (on
> the session object). This includes methods on nodes and properties and
> so on. If this does turn out to be a performance problem, we can
> remove synchronization where required (and where it can safely be
> removed) or change the implementation (use immutable objects or safe
> data structures).
> 
> This is more conservative, but I think the impact on performance will
> be minimal. Of course performance is important, however I think data
> consistency is more important than the possible gain of a few percents
> of (read-) performance.
> 
> Regards,
> Thomas
>