You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Dick Murray <da...@gmail.com> on 2017/03/24 10:51:19 UTC

Understanding DatasetGraph getLock() (DatasetGraphInMem throwing a curve ball)...

Hi.

Is there a way to get what Transactional a DatasetGraph is using and
specifically what Lock semantics are in force?

As part of a distributed DatasetGraph implementation I have a
DatasetGraphTry wrapper which adds Boolean tryBegin(ReadWrite) and as the
name suggests it will try to lock the given DatasetGraph and return
immediately, i.e. not block. Internally if it acquires the lock it will
call the wrapped void begin(ReadWrite) which "should" not block. This is
useful because I can round robin the DatasetGraph's which constitute the
distribution without blocking. Especially useful as some of the
DatasetGraph's are running in other JVM's.

Currently I've reverted the mapping to the DatasetGraph class (requires I
manually check the Jena code) but I'd like to understand why and possibly
make the code neater...

To automate the wrapping I pulled the Lock via getLock() and used the class
to lookup the appropriate wrapper. But after digging I noticed that the
Lock from getLock() doesn't always match the Transactional locking
semantics.

DatasetGraphInMem getLock() returns org.apache.jena.shared.LockMRSW but
internally its Transactional implementation is
using org.apache.jena.shared.LockMRPlusSW which is subtly different. This
is noticeable because getLock() isn't overridden but inherits from
DatasetGraphBase which declares LockMRSW.

A TDB backed DatasetGraph masquerades as a;

DatasetGraphTransaction

DatasetGraphTrackActive

DatasetGraphWrapper

which wraps the DatasetGraphTDB

DatasetGraphTripleQuads

DatasetGraphBaseFind

DatasetGraphBase where the getLock() returns



INFO Thread[main,5,main] [class
org.apache.jena.sparql.core.mem.DatasetGraphInMemory]
INFO Thread[main,5,main] [class org.apache.jena.shared.LockMRSW]

INFO Thread[main,5,main] [class
org.apache.jena.tdb.transaction.DatasetGraphTransaction]
INFO Thread[main,5,main] [class org.apache.jena.shared.LockMRSW]
INFO Thread[main,5,main] [class org.apache.jena.tdb.store.DatasetGraphTDB]
INFO Thread[main,5,main] [class org.apache.jena.shared.LockMRSW]

Regards Dick.

Re: Understanding DatasetGraph getLock() (DatasetGraphInMem throwing a curve ball)...

Posted by Dick Murray <da...@gmail.com>.
Thanks for the background. I'll map to TDB and Mem and throw an UOE if
"another" DG is encountered.

Same here, I drew a blank on a Jena optimistic lock and try lock. So I've
created a LockMRAndMW (effectively lazy) which is used to control the
DatasetGraphDistributed i.e. no blocking via the begin(ReadWrite). Then
when the streams are called (e.g. find(...)) the actual DG's have the read
transaction started. Also a LockMRSWTry and LockMRPlusSWTry which wrap the
TDB and Mem lock semantics.

It was REALLY important for us that we don't block on the begin(ReadWrite)
call as we are currently aggregating 18 separate JVM TDB/Mem instances into
one DG (via a Thrift DG implementation). Specifically when we perform an
ETL we try each remote DG until we acquire a write lock then the quads are
loaded. This way we can support multiple writes as we effectively shard the
TDB. This way we reduced bulk ETL load times from the sum of all load times
to simplistically the longest load time (assuming we have enough shards...)

Internally the sharded DG's are only locked when they are touched.

The majority of DG's are TDB backed but the system recognises certain
"things" and will spin up a Mem backed DG in another JVM to perform adhoc
work then tear it down.

On 24 March 2017 at 11:41, A. Soroka <aj...@virginia.edu> wrote:

> The lock from getLock is always the same semantics for every impl--
> currently MRSW, with no expectation for changing. It's a kind of "system
> lock" to keep the internal state of that class consistent. That's distinct
> from the transactional semantics of a given impl. In some cases, the
> semantics happen to coincide, when the actual transactional semantics are
> also MRSW. But sometimes they don't (actually, I think DatasetGraphInMem is
> the only example where they don't right now, but I am myself tinkering with
> another example and I am confident that we will have more). When they
> don't, you need to rely on the impl to manage its own transactionality, via
> the methods for that purpose.  I'm not actually sure we have a good
> non-blocking method for your use right now. We have inTransaction(), but
> that's not too helpful here.
>
> But someone else can hopefully point to a technique that I am missing.
>
>
> ---
> A. Soroka
> The University of Virginia Library
>
> > On Mar 24, 2017, at 6:51 AM, Dick Murray <da...@gmail.com> wrote:
> >
> > Hi.
> >
> > Is there a way to get what Transactional a DatasetGraph is using and
> > specifically what Lock semantics are in force?
> >
> > As part of a distributed DatasetGraph implementation I have a
> > DatasetGraphTry wrapper which adds Boolean tryBegin(ReadWrite) and as the
> > name suggests it will try to lock the given DatasetGraph and return
> > immediately, i.e. not block. Internally if it acquires the lock it will
> > call the wrapped void begin(ReadWrite) which "should" not block. This is
> > useful because I can round robin the DatasetGraph's which constitute the
> > distribution without blocking. Especially useful as some of the
> > DatasetGraph's are running in other JVM's.
> >
> > Currently I've reverted the mapping to the DatasetGraph class (requires I
> > manually check the Jena code) but I'd like to understand why and possibly
> > make the code neater...
> >
> > To automate the wrapping I pulled the Lock via getLock() and used the
> class
> > to lookup the appropriate wrapper. But after digging I noticed that the
> > Lock from getLock() doesn't always match the Transactional locking
> > semantics.
> >
> > DatasetGraphInMem getLock() returns org.apache.jena.shared.LockMRSW but
> > internally its Transactional implementation is
> > using org.apache.jena.shared.LockMRPlusSW which is subtly different.
> This
> > is noticeable because getLock() isn't overridden but inherits from
> > DatasetGraphBase which declares LockMRSW.
> >
> > A TDB backed DatasetGraph masquerades as a;
> >
> > DatasetGraphTransaction
> >
> > DatasetGraphTrackActive
> >
> > DatasetGraphWrapper
> >
> > which wraps the DatasetGraphTDB
> >
> > DatasetGraphTripleQuads
> >
> > DatasetGraphBaseFind
> >
> > DatasetGraphBase where the getLock() returns
> >
> >
> >
> > INFO Thread[main,5,main] [class
> > org.apache.jena.sparql.core.mem.DatasetGraphInMemory]
> > INFO Thread[main,5,main] [class org.apache.jena.shared.LockMRSW]
> >
> > INFO Thread[main,5,main] [class
> > org.apache.jena.tdb.transaction.DatasetGraphTransaction]
> > INFO Thread[main,5,main] [class org.apache.jena.shared.LockMRSW]
> > INFO Thread[main,5,main] [class org.apache.jena.tdb.store.
> DatasetGraphTDB]
> > INFO Thread[main,5,main] [class org.apache.jena.shared.LockMRSW]
> >
> > Regards Dick.
>
>

Re: Understanding DatasetGraph getLock() (DatasetGraphInMem throwing a curve ball)...

Posted by "A. Soroka" <aj...@virginia.edu>.
The lock from getLock is always the same semantics for every impl-- currently MRSW, with no expectation for changing. It's a kind of "system lock" to keep the internal state of that class consistent. That's distinct from the transactional semantics of a given impl. In some cases, the semantics happen to coincide, when the actual transactional semantics are also MRSW. But sometimes they don't (actually, I think DatasetGraphInMem is the only example where they don't right now, but I am myself tinkering with another example and I am confident that we will have more). When they don't, you need to rely on the impl to manage its own transactionality, via the methods for that purpose.  I'm not actually sure we have a good non-blocking method for your use right now. We have inTransaction(), but that's not too helpful here.

But someone else can hopefully point to a technique that I am missing.


---
A. Soroka
The University of Virginia Library

> On Mar 24, 2017, at 6:51 AM, Dick Murray <da...@gmail.com> wrote:
> 
> Hi.
> 
> Is there a way to get what Transactional a DatasetGraph is using and
> specifically what Lock semantics are in force?
> 
> As part of a distributed DatasetGraph implementation I have a
> DatasetGraphTry wrapper which adds Boolean tryBegin(ReadWrite) and as the
> name suggests it will try to lock the given DatasetGraph and return
> immediately, i.e. not block. Internally if it acquires the lock it will
> call the wrapped void begin(ReadWrite) which "should" not block. This is
> useful because I can round robin the DatasetGraph's which constitute the
> distribution without blocking. Especially useful as some of the
> DatasetGraph's are running in other JVM's.
> 
> Currently I've reverted the mapping to the DatasetGraph class (requires I
> manually check the Jena code) but I'd like to understand why and possibly
> make the code neater...
> 
> To automate the wrapping I pulled the Lock via getLock() and used the class
> to lookup the appropriate wrapper. But after digging I noticed that the
> Lock from getLock() doesn't always match the Transactional locking
> semantics.
> 
> DatasetGraphInMem getLock() returns org.apache.jena.shared.LockMRSW but
> internally its Transactional implementation is
> using org.apache.jena.shared.LockMRPlusSW which is subtly different. This
> is noticeable because getLock() isn't overridden but inherits from
> DatasetGraphBase which declares LockMRSW.
> 
> A TDB backed DatasetGraph masquerades as a;
> 
> DatasetGraphTransaction
> 
> DatasetGraphTrackActive
> 
> DatasetGraphWrapper
> 
> which wraps the DatasetGraphTDB
> 
> DatasetGraphTripleQuads
> 
> DatasetGraphBaseFind
> 
> DatasetGraphBase where the getLock() returns
> 
> 
> 
> INFO Thread[main,5,main] [class
> org.apache.jena.sparql.core.mem.DatasetGraphInMemory]
> INFO Thread[main,5,main] [class org.apache.jena.shared.LockMRSW]
> 
> INFO Thread[main,5,main] [class
> org.apache.jena.tdb.transaction.DatasetGraphTransaction]
> INFO Thread[main,5,main] [class org.apache.jena.shared.LockMRSW]
> INFO Thread[main,5,main] [class org.apache.jena.tdb.store.DatasetGraphTDB]
> INFO Thread[main,5,main] [class org.apache.jena.shared.LockMRSW]
> 
> Regards Dick.