You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@jena.apache.org by Claude Warren <cl...@xenei.com> on 2016/01/02 15:18:23 UTC

A proposal for a new locking strategy

Currently most Jena implementations use a multiple read one write
solution.  However, I think that it is possible (with minimal work) do
provide a solution that would allow for multiple writers by using lower
level locks.

I take inspiration from the Privileges code.  That code allows privileges
to be determined down to the triple level.  Basically it does the following
{noformat}
start
 |
 v
may user perform operation on graph? → (no) (restrict)
 |
 v
(yes)
may user perform operation on any triple in graph → (yes) (allow)
 |
 v
(no)
may user perform operation on the specific triple in graph → (yes) (allow)
 |
 v
(no) (restrict)
{noformat}

My thought is that the locking may work much the same way.  Once one thread
has the objects locked the any other thread may not lock the object.  The
process would be something like:

Graph locking would require exclusive lock or non-exclusive lock.  If the
entire graph were to be locked for writing (as in the current system) then
the request would be for an exclusive write-lock on the graph.  Once an
exclusive write lock has been established no other write lock may be
applied to the graph or any of its triples by any other thread.

If a thread only wanted to lock part of the graph, for example all triples
matching <u:foo ANY ANY>, the thread would first acquire a non-exclusive
write lock on the graph.  It would then acquire an exclusive write lock on
all triples matching <u:foo ANY ANY>.  Once that triple match lock was
acquired no other thread would be able to lock any triple who's subject was
u:foo.

The lock request would need to contain the graph name and (in the case of a
partial graph lock) a set of triple patterns to lock.  The flow for the
lock would be something like:

{noformat}
start
 |
 v
does the thread hold an exclusive graph lock → (yes) (success)
 |
 v
(no)
does the thread want an exclusive graph lock → (yes) (go to ex graph lock)
 |
 v
(no)
does the thread hold a non-exclusive graph lock → (no) (go to nonex graph
lock)
 |
 v
(yes) (lbl:lock acquired)
can the thread acquire all the triple locks  → (yes) (success)
 |
 v
(no) (failure)


(lbl: nonex graph lock)
does any thread hold an exclusive graph lock → (yes) (failure)
 |
 v
(no)
acquire non-exclusive graph lock
(goto lock acquired)


(lbl: ex graph lock)
does any thread hold an exclusive graph lock → (yes) (failure)
 |
 v
(no)
does any thread hold a non-exclusive graph lock → (yes) (failure)
 |
 v
(no)
acquire exclusive graph lock
(success)

{noformat}

The permissions system uses an abstract engine to determine if the user has
access to the triples.  For the locking mechanism the system needs to track
graph locks and triple patterns locked.  If a new request for a triple
pattern matches any existing (already locked) pattern the lock request
fails.

The simple releaseLock() will release all locks the thread holds.

Note that the locking system does not check the graph being locked to see
if the items exist in the graph it is simply tracking patterns of locks and
determining if there are any conflicts between the patterns.

Because this process can duplicate the current locking strategy it can be
used as a drop in replacement in the current code.  So current code would
continue to operate as it does currently but future development could be
more sensitive to locking named graphs, and partial updates to provide
multi-thread updates.

Thoughts?
Claude

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: A proposal for a new locking strategy

Posted by Andy Seaborne <an...@apache.org>.

Your proposal is interesting and it would be good to have alternative 
implementations for graph/dataset that provide different characteristics.

Adam was talking about multiple writer concurrency just recently - this 
looks to be in the same area.

As a nested lock scheme, there are some technical questions that any 
scheme of this class has to take a position on. I'll take those to a 
different reply.

The system wide MRSW policy, which is a baseline assumption of usage,
exists to

(1) protects the implementation datastructures
(2) give consistency of access

MRSW as a policy applies to graphs and datasets but also on the 
inference engines so it is not a matter of triple patterns. Dave might 
be able to see a way but to me it looks like a major change or locking 
applied in a way that is no change to present.

(1) => A Triple in the memory graph is indexed 3 ways - all three 
indexes move in-step. So MRSW locking provides consistent access/update 
and safe data structure access. The data structures aren't concurrent 
safe and there is no point making them so because there are multiple 
indexes. ConcurrentHashMap would not be enough - it needs coordination 
across three different maps.

Whatever you do in a new locking strategy, you have to do this.

Even TxnMem, which uses what are technically described as a "lock free 
algorithm" still has internal mutexes for changes to several things at 
the same time.

(2) => Consistency (in database-speak, isolation) matters because an 
application at the API level, and more importantly in SPARQL for most of 
our users making concurrent use of Jena, are making a number of accesses 
into the graph glued together by code.

MRSW applies to critical sections, and being a non-nested lock system 
over the whole graph/dataset, means that applications sees a consistent 
view of the data.

     Andy

On 02/01/16 14:18, Claude Warren wrote:
> Currently most Jena implementations use a multiple read one write
> solution.  However, I think that it is possible (with minimal work) do
> provide a solution that would allow for multiple writers by using lower
> level locks.
>
> I take inspiration from the Privileges code.  That code allows privileges
> to be determined down to the triple level.  Basically it does the following
> {noformat}
> start
>   |
>   v
> may user perform operation on graph? → (no) (restrict)
>   |
>   v
> (yes)
> may user perform operation on any triple in graph → (yes) (allow)
>   |
>   v
> (no)
> may user perform operation on the specific triple in graph → (yes) (allow)
>   |
>   v
> (no) (restrict)
> {noformat}
>
> My thought is that the locking may work much the same way.  Once one thread
> has the objects locked the any other thread may not lock the object.  The
> process would be something like:
>
> Graph locking would require exclusive lock or non-exclusive lock.  If the
> entire graph were to be locked for writing (as in the current system) then
> the request would be for an exclusive write-lock on the graph.  Once an
> exclusive write lock has been established no other write lock may be
> applied to the graph or any of its triples by any other thread.
>
> If a thread only wanted to lock part of the graph, for example all triples
> matching <u:foo ANY ANY>, the thread would first acquire a non-exclusive
> write lock on the graph.  It would then acquire an exclusive write lock on
> all triples matching <u:foo ANY ANY>.  Once that triple match lock was
> acquired no other thread would be able to lock any triple who's subject was
> u:foo.
>
> The lock request would need to contain the graph name and (in the case of a
> partial graph lock) a set of triple patterns to lock.  The flow for the
> lock would be something like:
>
> {noformat}
> start
>   |
>   v
> does the thread hold an exclusive graph lock → (yes) (success)
>   |
>   v
> (no)
> does the thread want an exclusive graph lock → (yes) (go to ex graph lock)
>   |
>   v
> (no)
> does the thread hold a non-exclusive graph lock → (no) (go to nonex graph
> lock)
>   |
>   v
> (yes) (lbl:lock acquired)
> can the thread acquire all the triple locks  → (yes) (success)
>   |
>   v
> (no) (failure)
>
>
> (lbl: nonex graph lock)
> does any thread hold an exclusive graph lock → (yes) (failure)
>   |
>   v
> (no)
> acquire non-exclusive graph lock
> (goto lock acquired)
>
>
> (lbl: ex graph lock)
> does any thread hold an exclusive graph lock → (yes) (failure)
>   |
>   v
> (no)
> does any thread hold a non-exclusive graph lock → (yes) (failure)
>   |
>   v
> (no)
> acquire exclusive graph lock
> (success)
>
> {noformat}
>
> The permissions system uses an abstract engine to determine if the user has
> access to the triples.  For the locking mechanism the system needs to track
> graph locks and triple patterns locked.  If a new request for a triple
> pattern matches any existing (already locked) pattern the lock request
> fails.
>
> The simple releaseLock() will release all locks the thread holds.
>
> Note that the locking system does not check the graph being locked to see
> if the items exist in the graph it is simply tracking patterns of locks and
> determining if there are any conflicts between the patterns.
>
> Because this process can duplicate the current locking strategy it can be
> used as a drop in replacement in the current code.  So current code would
> continue to operate as it does currently but future development could be
> more sensitive to locking named graphs, and partial updates to provide
> multi-thread updates.
>
> Thoughts?
> Claude
>

Re: A proposal for a new locking strategy

Posted by "A. Soroka" <aj...@virginia.edu>.

I’m really glad to read this thread! MRMW operation could lead to some great performance gains for the users.

Andy mentioned that I’ve been looking at some overlapping issues, in the context of a “writer-per-named-graph” dataset design. I’ll comment in-line.

---
A. Soroka
The University of Virginia Library

> On Jan 3, 2016, at 8:45 AM, Claude Warren <cl...@xenei.com> wrote:
>> As I understand it, the application is now involved to describe the part of the graph to lock. Is that right?
> 
> That was my thought yes.  I was only looking at how to lock for write.  I had not considered read locking, though I expected it would work like the current lock does in the sense that read locks block write locks.  Though I suppose it might be possible in some situations to specify what the thread was going to read.

In TxnMem I was able to use persistent datastructures to allow readers to continue while a writer was operating (with isolation). That may not be a characteristic we can always support, but it might be useful to think about the cases where we can support it. (E.g. when the readers and writers know in advance what is to be covered, or when a reader is doing something that the application knows is not going to be affected by some particular writers, even though they may be working over the same regions.) 

>> Does failure mean an exception or waiting?
> My thought was failure/exception.  As you point out waiting leads to deadlock issues.  However, the decision to wait or fail could be implemented outside of the engine that determines if the lock can be granted.  So it could be a plugable decision.  For the first cut I would simply implement failure.
> But looking at the enterCriticalSection() code I see that it does not allow failure, but that there is a mention of an error if a read lock is promoted to a write lock.  I am unsure how to handle this case.  I think that a lock failure exception may be needed.

I think failing by default is going to be confusing to many users of this machinery. Indirecting the decision seems like a good way to provide flexibility, and I suspect that a default of blocking on the lock will be more “ergonomic” for users.

> A possible solution is to delay and retry with a retry time out/limit before throwing a timeout type exception.  This area needs work.

Blocking-with-policy is nicely flexible. Andy showed a nice idiom that could be extended to this kind of us:

https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/sparql/core/mem/DatasetGraphInMemory.java#L210

Perhaps something like:

{noformat}

LockAcquisitionPolicy policyDefault = …;
LockAcquisitionPolicy specialPolicy = …;
LockRegions locks = new LockRegions(policyDefault);
Triple lockregion1 = ...;
locks.addRegion(lockregion1, ReadWrite.WRITE); // do not acquire lock, just add region to scope
Triple lockregion2 = …;
locks.addRegion(lockregion2, ReadWrite.READ, specialPolicy); // do not acquire lock, just add region to scope

Consumer<LockRegions> task = locks -> {
    doSomeStuff();
    locks.unlock(lockregion1);
    doSomeMoreStuff();
    locks.unlock(lockregion2);
    finishUp();
} 
try {
    withLocks(locks, task); // acquire locks and execute task with them
} catch( LockException e ) { 
    log.error(“Urg!");
} finally {
    locks.unlock();
}

The LockRegions object could then be reused (possibly with regions added or removed) after .unlock(), so the work of determining what triples map into an entity doesn’t have to be entirely redone.

{noformat}

>> How far back are you going to wind back to? In a transaction system either the transaction is aborted (so all the way back); some system retry (i.e. rerun application code).
>> 
> I had assumed that the application would lock all the triples it needed at the start of the lock.

This seems non-ideal to me from a user standpoint. It would be nice to be able to add locks on the fly (possibly blocking) because in a schemaless database, the application may not know at the beginning of the locked section what all of the triples involved in an entity are, and there may be no way to discover that short of read-locking some triples and developing more queries based on those triples and iterating that.

> So lets assume code execution like:
> {noformat}
> {
>    try {
>        Lock( <foo ANY ANY> )
>        // some processing here
>        try {
>            Lock( <bar ANY ANY >)
>            // some processing here
>        } catch (LockFailedException e)  {
>            // handle "bar" lock failure
>        } finally {
>            Unlock( <bar ANY ANY > )
>        }
>    } catch (LockFailedException e ) {
>        // handle "foo" lock failure
>    } finally {
>        Unlock( <foo, ANY ANY > )
>   }
>   // for grins ;)
>   Unlock(); // release all held locks.
> }
> 
> {noformat}

It would be really nice if the locking API produced some kind of object that can be carried away and shared. In other conversations we’ve begun to touch on the possibility of transactions shared between threads, and it would be nice to plan for that as early as possible.

>> If we consider the lock request to be a set of patterns then then the lock
> itself holds a set of patterns.  For example Lock( <foo ANY ANY> <bar ANY ANY ) would create a lock holding the fooAny and barAny triples. Subsequently calling Lock( <baz, ANY ANY> ) would result in the set: <foo ANY ANY> <bar ANY ANY> <baz ANY ANY>.  calling Unlock( <bar ANY ANY> ) would result in the set: <foo ANY ANY> <bar ANY ANY>.

This gets a little trickier when the patterns overlap, doesn’t it? Or is the contract here that Unlock can only be used with a pattern that exactly matches a pattern that has been used with Lock?

> I think that the starting place may be to extend the Lock interface to include:
> enterCriticalSection( triple .... );  // establish a lock for the triples
> Graph getLockPattern(); // get the lock pattern
> void lock( triple ... ); // add to lock pattern
> void unlock( triple ...); // remove from lock pattern
> void unlock(); // remove all locks
> 
> It might make more sense to put the last 4 into a separate interface and have a method to return an instance of that interface.  Something like:
> 
> LockHoldings getLocks();
> 
> It seems to me the complex part is going to be determining if any thread already has the a lock on the object from the patterns.  I can dig into that to see if I can get an early prototype of lock tracking.

This is something like a generalization of where I ended up for my draft of “per-graph” locking. I have a LockSet that is pretty simple because I only have to deal with a single pattern <g ANY ANY ANY>. In my case, I was able to determine who is locking what fairly easily with set intersection. It might be worth trying to think about how to build up simpler cases (e.g. patterns that partition the dataset between them) into the more general case of arbitrary patterns.

Re: A proposal for a new locking strategy

Posted by "A. Soroka" <aj...@virginia.edu>.

> On Jan 24, 2016, at 9:24 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> The structure locks have a different lifetime as well.  One might be needed for the iterator to move to the next lump of triples, where "lump" is due the implementation of the dataset - in TDB the iterator will be walking only one index and so that part of that index must not be mutated at the point where the iterator dives back in to read some more items.  Coordinating with the other indexes is left as an exercise for the reader (hmm - actually it looks amusingly tricky in the general case).

It seems like one effect of this move would be to force some changes to handling transactions together with locks. Currently, at least some transactional components (e.g. some Datasets) handle locking that is for both “structure” and “data” as part of transaction control methods.

> Background digression:
> Many databases nowadays have multi-version data - see MVCC - where both last committed and "being changed" are around.  For RDF that gets potentially high overhead with many true multiple writers as quads are small so admin bytes can be a high %-age.  Using MVCC is a whole different design space.

TxMem begins to step into this area in a small way. This is one reason I would like at some point to explore techniques for mutate-in-place-within-one-transaction semantics (what Clojure calls “transient” data structures, although they intend mutate-in-place-within-one-thread). That should save some amount of the overhead.

> Structure locks are short term - like the commit lock in TxnMem when the global root is updated.

Indeed, that lock protects against getting the dataset's internal state corrupted, but doesn’t guard any data, really.

> The case of partition-by-graph is simpler (much simpler?).  It's almost a top level ConcurrentHashMap of graphs and then per graph MRSW.

I suspect that any partitioning scheme is simpler (although I admit I haven’t got a formal argument for that), and that’s one reason I want to investigate partition-by-graph. I did try MR+SW per graph to begin with, but I haven’t had a chance to reason through how reads and writes against different graphs interact. It may be too much for me.

> This affects the index choices; default union graph might have to be a loop as it is in dataset general but "default union graph" with per graph updates happening is going to be a friction point anyway.

Even though all the graphs are in one dataset, if the underlying graphs can be locked independently, union graph would be in some ways like a view over multiple bases. Most views in Jena that I have used are more often a single graph out of a dataset— even with independently lockable graphs, the view would be within one lockable region that is the base, if I am making myself clear… {grin}

---
A. Soroka
The University of Virginia Library

Re: A proposal for a new locking strategy

Posted by Andy Seaborne <an...@apache.org>.

On 21/01/16 17:31, A. Soroka wrote:
>> On Jan 19, 2016, at 8:27 AM, Andy Seaborne <an...@apache.org>
>> wrote:
>>
>> Structure locks are a feature of the implementation like java's
>> "synchronized". "structure locks" serve a different purpose to
>> "data locks" (the terminology is invented - there is probably a
>> proper set of terms). They stop crashes, NPEs and generally
>> corrupting the datastructures, are datastructure specific but do
>> not care about consistency of the data and operations on the data
>> from the application, data model, point of view. What is needed may
>> be influenced by the upper level locking or it might be that the
>> structure locks are a set of guaranttees to build on.
>
> Let me use the example (Iterator<Triple> iter = graph.find(S,P,O) ;)
> to check my understanding of your thinking.
>
> Let’s say I got the graph from an underlying Dataset (it’s a view)
> and I must assume that writers are acting around me. If I want to use
> the two systems of locking, I acquire a “data” lock on the pattern
> (S,P,O) from the graph, and that action acquires a “data” lock in the
> Dataset, that also appears in the locking API of the graph and of any
> other views into the Dataset. I also acquire a "structure" lock on
> the graph, and that would be implemented in the graph, so it’s not
> visible to the Dataset or to other views into that Dataset.

So far, so good ...

> The “data” lock is used to manage consistency of my triples from a
> application POV irrespective of how they are stored or accessed, and
> the “structure” lock prevents someone else from doing something to
> the graph object that could clash with what I am doing to throw it
> into an inconsistent state (not throw the data into an inconsistent
> state from the application POV, but actually mung up the state of the
> object itself resulting in an exception or unpredictable wrong
> behavior). Is that right?

Yes.

The structure locks have a different lifetime as well.  One might be 
needed for the iterator to move to the next lump of triples, where 
"lump" is due the implementation of the dataset - in TDB the iterator 
will be walking only one index and so that part of that index must not 
be mutated at the point where the iterator dives back in to read some 
more items.  Coordinating with the other indexes is left as an exercise 
for the reader (hmm - actually it looks amusingly tricky in the general 
case).

Background digression:

Many databases nowadays have multi-version data - see MVCC - where both 
last committed and "being changed" are around.  For RDF that gets 
potentially high overhead with many true multiple writers as quads are 
small so admin bytes can be a high %-age.  Using MVCC is a whole 
different design space.

> If that is correct, I am then trying to understand what that means
> for the semantics of the “structure” locks. It starts to seem like
> they must be MRSW in most cases, with exceptions available when the
> implementation supports more powerful behavior. It’s almost as though
> the current system locks evolve into “structure” locks and the “data”
> locks are overlaid to support application-specific notions.

Structure locks are short term - like the commit lock in TxnMem when the 
global root is updated.

The case of partition-by-graph is simpler (much simpler?).  It's almost 
a top level ConcurrentHashMap of graphs and then per graph MRSW.

This affects the index choices; default union graph might have to be a 
loop as it is in dataset general but "default union graph" with per 
graph updates happening is going to be a friction point anyway.

And remember ConcurrentHashMap is only "mostly concurrent"! Two writes 
to the same segment use exclusive locking.

     Andy

>
> --- A. Soroka The University of Virginia Library
>
>

Re: A proposal for a new locking strategy

Posted by "A. Soroka" <aj...@virginia.edu>.

> On Jan 19, 2016, at 8:27 AM, Andy Seaborne <an...@apache.org> wrote:
> 
> Structure locks are a feature of the implementation like java's "synchronized". "structure locks" serve a different purpose to "data locks" (the terminology is invented - there is probably a proper set of terms). They stop crashes, NPEs and generally corrupting the datastructures, are datastructure specific but do not care about consistency of the data and operations on the data from the application, data model, point of view. What is needed may be influenced by the upper level locking or it might be that the structure locks are a set of guaranttees to build on.

Let me use the example (Iterator<Triple> iter = graph.find(S,P,O) ;) to check my understanding of your thinking.

Let’s say I got the graph from an underlying Dataset (it’s a view) and I must assume that writers are acting around me. If I want to use the two systems of locking, I acquire a “data” lock on the pattern (S,P,O) from the graph, and that action acquires a “data” lock in the Dataset, that also appears in the locking API of the graph and of any other views into the Dataset. I also acquire a "structure" lock on the graph, and that would be implemented in the graph, so it’s not visible to the Dataset or to other views into that Dataset.

The “data” lock is used to manage consistency of my triples from a application POV irrespective of how they are stored or accessed, and the “structure” lock prevents someone else from doing something to the graph object that could clash with what I am doing to throw it into an inconsistent state (not throw the data into an inconsistent state from the application POV, but actually mung up the state of the object itself resulting in an exception or unpredictable wrong behavior). Is that right?

If that is correct, I am then trying to understand what that means for the semantics of the “structure” locks. It starts to seem like they must be MRSW in most cases, with exceptions available when the implementation supports more powerful behavior. It’s almost as though the current system locks evolve into “structure” locks and the “data” locks are overlaid to support application-specific notions.

---
A. Soroka
The University of Virginia Library

Re: A proposal for a new locking strategy

Posted by Andy Seaborne <an...@apache.org>.

On 14/01/16 17:14, A. Soroka wrote:
> If I understand the example correctly (and that is a big if, because
> I am good at missing the point of these things) either it needs both
> kinds of locks, or we have to be able to guarantee that all reads and
> writes occur inside a transaction and that all transactions have
> snapshot isolation and we would need some system of policy for
> transactions (which doesn’t seem remotely practical).

Yes - and across the iterator as well.

Whether the iterator sees the DB as at the time of the find() call 
(perfect isolation) or concurrent changing (e.g. read committed like 
effects) is a design decision.

> Now I am trying to reason about two overlaid system of locks. It
> seems to me right now that the data locks could still live in the
> base data container, that is whatever object is not a view and has
> the actual node references in it (in Andy’s proposed possible design
> with Dataset as the “foundation” container, that always would be
> Dataset) or as Andy wrote, “on triples", and the structure locks
> would live in whatever object was used to create them.
>
> But Andy, it seems from your remark (“structure locks (on indexes)”)
> that you are saying that structure locks would live in whatever
> object has the actual node references in it, or am I misunderstanding
> you? Or are you saying that the structure locks would not respect
> classes/objects like Model, Graph, etc., but instead would be
> associated directly to the data structures that are storing node
> references?

Structure locks are a feature of the implementation like java's 
"synchronized". "structure locks" serve a different purpose to "data 
locks" (the terminology is invented - there is probably a proper set of 
terms).

They stop crashes, NPEs and generally corrupting the datastructures, are 
datastructure specific but do not care about consistency of the data and 
operations on the data from the application, data model, point of view.

What is needed may be influenced by the upper level locking or it might 
be that the structure locks are a set of guaranttees to build on.

Java locks are built on "Syncs"
"Syncs" are built on OS features
At the bottom of the OS are spin locks.

The iterator example is particularly nasty because the iterator executes 
after the call to find() returns but in the ideal case, returns results 
consistent with the data as at the point of the find().
(c.f. concurrent modification exception)

	Andy

> ---
> A. Soroka
> The University of Virginia Library
>
>> On Jan 8, 2016, at 4:20 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>> If you want an example to think about ...
>>
>>    Iterator<Triple> iter = graph.find(S,P,O) ;
>>
>> How does that work in any locking scheme when there may be writers about during iteration.
>>
>> Look like it needs needs both data locks (on triples) and structure locks (on indexes).
>>
>> 	Andy
>

Re: A proposal for a new locking strategy

Posted by "A. Soroka" <aj...@virginia.edu>.

If I understand the example correctly (and that is a big if, because I am good at missing the point of these things) either it needs both kinds of locks, or we have to be able to guarantee that all reads and writes occur inside a transaction and that all transactions have snapshot isolation and we would need some system of policy for transactions (which doesn’t seem remotely practical).

Now I am trying to reason about two overlaid system of locks. It seems to me right now that the data locks could still live in the base data container, that is whatever object is not a view and has the actual node references in it (in Andy’s proposed possible design with Dataset as the “foundation” container, that always would be Dataset) or as Andy wrote, “on triples", and the structure locks would live in whatever object was used to create them.

But Andy, it seems from your remark (“structure locks (on indexes)”) that you are saying that structure locks would live in whatever object has the actual node references in it, or am I misunderstanding you? Or are you saying that the structure locks would not respect classes/objects like Model, Graph, etc., but instead would be associated directly to the data structures that are storing node references?

---
A. Soroka
The University of Virginia Library

> On Jan 8, 2016, at 4:20 PM, Andy Seaborne <an...@apache.org> wrote:
> 
> If you want an example to think about ...
> 
>   Iterator<Triple> iter = graph.find(S,P,O) ;
> 
> How does that work in any locking scheme when there may be writers about during iteration.
> 
> Look like it needs needs both data locks (on triples) and structure locks (on indexes).
> 
> 	Andy

Re: A proposal for a new locking strategy

Posted by Andy Seaborne <an...@apache.org>.

On 26/01/16 21:22, Claude Warren wrote:
> Sorry I have not been participating in this discussion as much as I would
> like, as I have been busy with several projects.
>
> My thought was that the "engine" would work on a single dataset, though
> something that works across datasets in an XA style would also make sense.
> My original idea was something at the graph level reflected in the model to
> support the locking that is found in the current system but allow for
> multiple writers where possible.
>
> This discussion has changed my views somewhat.  I still need to spend more
> time with datasets to understand how they play with models and graphs.
> Part of what drives me to do this is the idea that we could have multiple
> servers able to serve and update data from a "single" dataset/model/graph.

If that is the goal, then the locking part is easier (?) as the 
partitioning is fixed by the physical allocation - it's very like 
ajs6f's graph locking partition.  The partitions could even run 
independently, just with a front-end to direct updates.

The joy then comes in the dataset or the query engine trying to stitch 
the global view back together again.

     Andy

>
> Claude
>
> On Sun, Jan 24, 2016 at 6:01 PM, A. Soroka <aj...@virginia.edu> wrote:
>
>> Claude, can I ask you to write more about `engine` in your outline below?
>> Are you proposing some global entity that manages locks across more than
>> one dataset? Or are you saying that something would be needed to coordinate
>> between graphs, models, and datasets that might have dependencies amongst
>> themselves, because right now, those different types can have independent
>> state (instead of all of them being views on datasets, as in Andy’s idea)?
>> In either case, it seems a little to me like the role of `engine` could be
>> filled by a (as-yet-non-existent) Transaction object, if I am understanding
>> correctly in either case.
>>
>> I definitely would think you would want to acquire locking context from
>> the dataset/model/graph with which you are working (or a view thereon).
>> That would allow for optimizations.
>>
>> ---
>> A. Soroka
>> The University of Virginia Library
>>
>>> On Jan 9, 2016, at 8:27 AM, Claude Warren <cl...@xenei.com> wrote:
>>>
>>> structure locks (on indexes) are implementation specific, right?
>>>
>>> I think this means that we need a way to register listeners that act
>>> somewhat like participants in a XA locking session.
>>>
>>> So we have something like
>>>
>>> request for lock received.
>>> lock the triples in the internal engine
>>> check each lock listener and lock the triples in the listener.
>>> if any listener can not lock undo the locks thus far and abort.
>>> otherwise return lock success.
>>>
>>> Perhaps a LockListener interface like:
>>>
>>> boolean exclusiveLock( Triple ... )
>>> void exclusiveUnlock( Triple ... )
>>> boolean sharedLock( Triple ... )
>>> void sharedUnlock( Triple ... )
>>>
>>> where Triple is any valid triple pattern like {<Foo> ANY ANY},  though
>>> perhaps Quads would be a better choice.
>>>
>>> Another possibility is that the engine lock call would return an object
>>> that could then be used as the context for the lock listener.
>>>
>>> LockContext ctxt = engine.lock( Triple ... );
>>> for (LockListener listener : listeners )
>>> {
>>>     listener.exclusiveLock( ctxt, Triple ... );
>>> }
>>>
>>> The above example is missing all the "Back out the locks on failure"
>> code.
>>>
>>> Perhaps the LockContext should be retrieved from the Graph, Model or
>>> Dataset so we have
>>>
>>> LockContext ctxt = dataset.createLockContext()
>>>
>>> engine.WriteLock( ctxt, Triple .... )
>>>
>>> LockContext would have a getListeners() method to provide a set of
>>> listeners so that the engine could call the appropriate listener checks.
>>>
>>>
>>> Thoughts?
>>>
>>> Claude
>>>
>>>
>>> On Fri, Jan 8, 2016 at 9:20 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>> If you want an example to think about ...
>>>>
>>>>    Iterator<Triple> iter = graph.find(S,P,O) ;
>>>>
>>>> How does that work in any locking scheme when there may be writers about
>>>> during iteration.
>>>>
>>>> Look like it needs needs both data locks (on triples) and structure
>> locks
>>>> (on indexes).
>>>>
>>>>         Andy
>>>>
>>>
>>>
>>>
>>> --
>>> I like: Like Like - The likeliest place on the web
>>> <http://like-like.xenei.com>
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>
>>
>
>

Re: A proposal for a new locking strategy

Posted by Claude Warren <cl...@xenei.com>.

Sorry I have not been participating in this discussion as much as I would
like, as I have been busy with several projects.

My thought was that the "engine" would work on a single dataset, though
something that works across datasets in an XA style would also make sense.
My original idea was something at the graph level reflected in the model to
support the locking that is found in the current system but allow for
multiple writers where possible.

This discussion has changed my views somewhat.  I still need to spend more
time with datasets to understand how they play with models and graphs.
Part of what drives me to do this is the idea that we could have multiple
servers able to serve and update data from a "single" dataset/model/graph.

Claude

On Sun, Jan 24, 2016 at 6:01 PM, A. Soroka <aj...@virginia.edu> wrote:

> Claude, can I ask you to write more about `engine` in your outline below?
> Are you proposing some global entity that manages locks across more than
> one dataset? Or are you saying that something would be needed to coordinate
> between graphs, models, and datasets that might have dependencies amongst
> themselves, because right now, those different types can have independent
> state (instead of all of them being views on datasets, as in Andy’s idea)?
> In either case, it seems a little to me like the role of `engine` could be
> filled by a (as-yet-non-existent) Transaction object, if I am understanding
> correctly in either case.
>
> I definitely would think you would want to acquire locking context from
> the dataset/model/graph with which you are working (or a view thereon).
> That would allow for optimizations.
>
> ---
> A. Soroka
> The University of Virginia Library
>
> > On Jan 9, 2016, at 8:27 AM, Claude Warren <cl...@xenei.com> wrote:
> >
> > structure locks (on indexes) are implementation specific, right?
> >
> > I think this means that we need a way to register listeners that act
> > somewhat like participants in a XA locking session.
> >
> > So we have something like
> >
> > request for lock received.
> > lock the triples in the internal engine
> > check each lock listener and lock the triples in the listener.
> > if any listener can not lock undo the locks thus far and abort.
> > otherwise return lock success.
> >
> > Perhaps a LockListener interface like:
> >
> > boolean exclusiveLock( Triple ... )
> > void exclusiveUnlock( Triple ... )
> > boolean sharedLock( Triple ... )
> > void sharedUnlock( Triple ... )
> >
> > where Triple is any valid triple pattern like {<Foo> ANY ANY},  though
> > perhaps Quads would be a better choice.
> >
> > Another possibility is that the engine lock call would return an object
> > that could then be used as the context for the lock listener.
> >
> > LockContext ctxt = engine.lock( Triple ... );
> > for (LockListener listener : listeners )
> > {
> >    listener.exclusiveLock( ctxt, Triple ... );
> > }
> >
> > The above example is missing all the "Back out the locks on failure"
> code.
> >
> > Perhaps the LockContext should be retrieved from the Graph, Model or
> > Dataset so we have
> >
> > LockContext ctxt = dataset.createLockContext()
> >
> > engine.WriteLock( ctxt, Triple .... )
> >
> > LockContext would have a getListeners() method to provide a set of
> > listeners so that the engine could call the appropriate listener checks.
> >
> >
> > Thoughts?
> >
> > Claude
> >
> >
> > On Fri, Jan 8, 2016 at 9:20 PM, Andy Seaborne <an...@apache.org> wrote:
> >
> >> If you want an example to think about ...
> >>
> >>   Iterator<Triple> iter = graph.find(S,P,O) ;
> >>
> >> How does that work in any locking scheme when there may be writers about
> >> during iteration.
> >>
> >> Look like it needs needs both data locks (on triples) and structure
> locks
> >> (on indexes).
> >>
> >>        Andy
> >>
> >
> >
> >
> > --
> > I like: Like Like - The likeliest place on the web
> > <http://like-like.xenei.com>
> > LinkedIn: http://www.linkedin.com/in/claudewarren
>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: A proposal for a new locking strategy

Posted by "A. Soroka" <aj...@virginia.edu>.

Claude, can I ask you to write more about `engine` in your outline below? Are you proposing some global entity that manages locks across more than one dataset? Or are you saying that something would be needed to coordinate between graphs, models, and datasets that might have dependencies amongst themselves, because right now, those different types can have independent state (instead of all of them being views on datasets, as in Andy’s idea)? In either case, it seems a little to me like the role of `engine` could be filled by a (as-yet-non-existent) Transaction object, if I am understanding correctly in either case.

I definitely would think you would want to acquire locking context from the dataset/model/graph with which you are working (or a view thereon). That would allow for optimizations.

---
A. Soroka
The University of Virginia Library

> On Jan 9, 2016, at 8:27 AM, Claude Warren <cl...@xenei.com> wrote:
> 
> structure locks (on indexes) are implementation specific, right?
> 
> I think this means that we need a way to register listeners that act
> somewhat like participants in a XA locking session.
> 
> So we have something like
> 
> request for lock received.
> lock the triples in the internal engine
> check each lock listener and lock the triples in the listener.
> if any listener can not lock undo the locks thus far and abort.
> otherwise return lock success.
> 
> Perhaps a LockListener interface like:
> 
> boolean exclusiveLock( Triple ... )
> void exclusiveUnlock( Triple ... )
> boolean sharedLock( Triple ... )
> void sharedUnlock( Triple ... )
> 
> where Triple is any valid triple pattern like {<Foo> ANY ANY},  though
> perhaps Quads would be a better choice.
> 
> Another possibility is that the engine lock call would return an object
> that could then be used as the context for the lock listener.
> 
> LockContext ctxt = engine.lock( Triple ... );
> for (LockListener listener : listeners )
> {
>    listener.exclusiveLock( ctxt, Triple ... );
> }
> 
> The above example is missing all the "Back out the locks on failure" code.
> 
> Perhaps the LockContext should be retrieved from the Graph, Model or
> Dataset so we have
> 
> LockContext ctxt = dataset.createLockContext()
> 
> engine.WriteLock( ctxt, Triple .... )
> 
> LockContext would have a getListeners() method to provide a set of
> listeners so that the engine could call the appropriate listener checks.
> 
> 
> Thoughts?
> 
> Claude
> 
> 
> On Fri, Jan 8, 2016 at 9:20 PM, Andy Seaborne <an...@apache.org> wrote:
> 
>> If you want an example to think about ...
>> 
>>   Iterator<Triple> iter = graph.find(S,P,O) ;
>> 
>> How does that work in any locking scheme when there may be writers about
>> during iteration.
>> 
>> Look like it needs needs both data locks (on triples) and structure locks
>> (on indexes).
>> 
>>        Andy
>> 
> 
> 
> 
> -- 
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren

Re: A proposal for a new locking strategy

Posted by Claude Warren <cl...@xenei.com>.

structure locks (on indexes) are implementation specific, right?

I think this means that we need a way to register listeners that act
somewhat like participants in a XA locking session.

So we have something like

request for lock received.
lock the triples in the internal engine
check each lock listener and lock the triples in the listener.
if any listener can not lock undo the locks thus far and abort.
otherwise return lock success.

Perhaps a LockListener interface like:

boolean exclusiveLock( Triple ... )
void exclusiveUnlock( Triple ... )
boolean sharedLock( Triple ... )
void sharedUnlock( Triple ... )

where Triple is any valid triple pattern like {<Foo> ANY ANY},  though
perhaps Quads would be a better choice.

Another possibility is that the engine lock call would return an object
that could then be used as the context for the lock listener.

LockContext ctxt = engine.lock( Triple ... );
for (LockListener listener : listeners )
{
    listener.exclusiveLock( ctxt, Triple ... );
}

The above example is missing all the "Back out the locks on failure" code.

Perhaps the LockContext should be retrieved from the Graph, Model or
Dataset so we have

LockContext ctxt = dataset.createLockContext()

engine.WriteLock( ctxt, Triple .... )

LockContext would have a getListeners() method to provide a set of
listeners so that the engine could call the appropriate listener checks.

Thoughts?

Claude

On Fri, Jan 8, 2016 at 9:20 PM, Andy Seaborne <an...@apache.org> wrote:

> If you want an example to think about ...
>
>    Iterator<Triple> iter = graph.find(S,P,O) ;
>
> How does that work in any locking scheme when there may be writers about
> during iteration.
>
> Look like it needs needs both data locks (on triples) and structure locks
> (on indexes).
>
>         Andy
>

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: A proposal for a new locking strategy

Posted by Andy Seaborne <an...@apache.org>.

If you want an example to think about ...

    Iterator<Triple> iter = graph.find(S,P,O) ;

How does that work in any locking scheme when there may be writers about 
during iteration.

Look like it needs needs both data locks (on triples) and structure 
locks (on indexes).

	Andy

Re: A proposal for a new locking strategy

Posted by "A. Soroka" <aj...@virginia.edu>.

I don’t think you need to worry at all, Claude. It’s my fault for being unclear. I was referring, not to what currently is the case, but what _would_ be the case if Jena went in directions that I understood you and Andy to have tossing around recently in speculative sorts of ways.

That is, I was trying to understand how two ideas, neither of which has been formally adopted or is being implemented, would interact if both of them were implemented, just so as to get a better understanding of both.

Sorry if I caused any confusion.

---
A. Soroka
The University of Virginia Library

> On Jan 8, 2016, at 3:21 PM, Claude Warren <cl...@xenei.com> wrote:
> 
> hmmmm... I'm missing something here.  I thought Datasets were ontop of
> Models.  if Datasets are the fundamental object then I need to get an
> implementation of permisisons on datasets.  Dang, and my plate is rather
> full at the moment. :(
> 
> I thought the layers went
> 
> Dataset
> Model
> Graph
> 
> Claude
> 
> On Fri, Jan 8, 2016 at 5:08 PM, A. Soroka <aj...@virginia.edu> wrote:
> 
>> I’d like to try to kick this thread around a bit, hopefully in a helpful
>> way. I’ve been thinking about an idea that Andy has tossed around a bit:
>> that Models, Graphs, etc. could just be views on underlying Datasets. I’ve
>> been trying to understand how this would work in connection with Claude’s
>> ideas about fine-grained, possibly nested locking. If I understand both
>> ideas correctly, it seems to me that the following result would hold in a
>> Jena that takes up both:
>> 
>> Locks on any kind of object other than an underlying Dataset would be
>> “views” (so to speak) over locks on the underlying Dataset. This would
>> ensure that locks made via a view are visible to other views and the
>> underlying Dataset and that locks made on the underlying Dataset are
>> visible to all views.
>> 
>> Does that claim seem reasonable (especially to Andy and Claude)?
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>>> On Jan 3, 2016, at 8:45 AM, Claude Warren <cl...@xenei.com> wrote:
>>> 
>>> Andy,
>>> I have not read trough the links in your last post yet but thought I
>> might
>>> answer the questions you posed as per my thoughts prior to reading those
>>> posts. (concepts subject to change)
>>> 
>>> 
>>>> As I understand it, the application is now involved to describe the part
>>>> of the graph to lock. Is that right?
>>>> 
>>> 
>>> That was my thought yes.  I was only looking at how to lock for write.  I
>>> had not considered read locking, though I expected it would work like the
>>> current lock does in the sense that read locks block write locks.
>> Though I
>>> suppose it might be possible in some situations to specify what the
>> thread
>>> was going to read.
>>> 
>>> 
>>>> 
>>>> To check here - locks are held for a critical section (as current
>> design)?
>>>> Not a single API call? Something else?
>>>> 
>>>> The permissions system is per API call but then permissions are not
>>>> changing like data is on a live system.
>>>> 
>>>> 
>>> I was thinking that they would be held for a critical section only as
>>> currently done.
>>> 
>>> I agree there are significant distinctions between the permissions and
>>> locking as the permissions data does not change within a single call.
>>> 
>>> 
>>> 
>>>> 
>>>> Does failure mean an exception or waiting?
>>>> 
>>>> 
>>> My thought was failure/exception.  As you point out waiting leads to
>>> deadlock issues.  However, the decision to wait or fail could be
>>> implemented outside of the engine that determines if the lock can be
>>> granted.  So it could be a plugable decision.  For the first cut I would
>>> simply implement failure.
>>> 
>>> But looking at the enterCriticalSection() code I see that it does not
>> allow
>>> failure, but that there is a mention of an error if a read lock is
>> promoted
>>> to a write lock.  I am unsure how to handle this case.  I think that a
>> lock
>>> failure exception may be needed.
>>> 
>>> A possible solution is to delay and retry with a retry time out/limit
>>> before throwing a timeout type exception.  This area needs work.
>>> 
>>> 
>>>> 
>>>> These tests themselves have to be done by some synchronized code becaue
>>>> there are multiple tests while other threads may be changing things at
>> the
>>>> same time.
>>>> 
>>>> 
>>> agreed.  The internals of the locking code need synchronization.
>>> 
>>> 
>>>> How far back are you going to wind back to? In a transaction system
>> either
>>>> the transaction is aborted (so all the way back); some system retry
>> (i.e.
>>>> rerun application code).
>>>> 
>>> 
>>> I had assumed that the application would lock all the triples it needed
>> at
>>> the start of the lock.  this is supposed to be a small critical section.
>>> However, I can see how nested class method calls might cause multiple
>> locks.
>>> 
>>> So lets assume code execution like:
>>> {noformat}
>>> {
>>>   try {
>>>       Lock( <foo ANY ANY> )
>>>       // some processing here
>>>       try {
>>>           Lock( <bar ANY ANY >)
>>>           // some processing here
>>>       } catch (LockFailedException e)  {
>>>           // handle "bar" lock failure
>>>       } finally {
>>>           Unlock( <bar ANY ANY > )
>>>       }
>>>   } catch (LockFailedException e ) {
>>>       // handle "foo" lock failure
>>>   } finally {
>>>       Unlock( <foo, ANY ANY > )
>>>  }
>>>  // for grins ;)
>>>  Unlock(); // release all held locks.
>>> }
>>> 
>>> {noformat}
>>> 
>>> The partial solution is to require lock ordering - locks must be acquired
>>>> in the same order and same class - and for a sequence of operations, not
>>>> just a single access.
>>>> 
>>>> The application is now involved - this is a specialized programming
>> model
>>>> for the ability to have multiple writers.
>>>> 
>>> 
>>> If we consider the lock request to be a set of patterns then then the
>> lock
>>> itself holds a set of patterns.  For example Lock( <foo ANY ANY> <bar ANY
>>> ANY ) would create a lock holding the fooAny and barAny triples.
>>> Subsequently calling Lock( <baz, ANY ANY> ) would result in the set: <foo
>>> ANY ANY> <bar ANY ANY> <baz ANY ANY>.  calling Unlock( <bar ANY ANY> )
>>> would result in the set: <foo ANY ANY> <bar ANY ANY>.
>>> 
>>> So locks for a single thread do not need to be released in the order they
>>> were acquired.  But all the locks must be released when the critical
>>> section exits.
>>> 
>>> 
>>> 
>>>> It would be a good thing to have as an additional choice of graph
>> (ideally
>>>> dataset) implementation.  A multi-writer dataset, then just make graph
>>>> views of the dataset, would great to have.
>>>> 
>>>> Is it easy to build an experimental version using the permissions code?
>>>> 
>>>>       Andy
>>>> 
>>>> 
>>> I'm not sure how much of the permissions code would be applicable.  It
>> uses
>>> dynamic proxies to intercept graph and model (and assorted helper
>> classes)
>>> method calls.
>>> 
>>> I think the locking engine is more likely like a query engine
>>> implementation having to do pattern matching on sets of triples.
>>> 
>>> I think that the starting place may be to extend the Lock interface to
>>> include:
>>> enterCriticalSection( triple .... );  // establish a lock for the triples
>>> Graph getLockPattern(); // get the lock pattern
>>> void lock( triple ... ); // add to lock pattern
>>> void unlock( triple ...); // remove from lock pattern
>>> void unlock(); // remove all locks
>>> 
>>> It might make more sense to put the last 4 into a separate interface and
>>> have a method to return an instance of that interface.  Something like:
>>> 
>>> LockHoldings getLocks();
>>> 
>>> 
>>> It seems to me the complex part is going to be determining if any thread
>>> already has the a lock on the object from the patterns.  I can dig into
>>> that to see if I can get an early prototype of lock tracking.
>>> 
>>> Claude
>>> 
>>> --
>>> I like: Like Like - The likeliest place on the web
>>> <http://like-like.xenei.com>
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>> 
>> 
> 
> 
> -- 
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren

Re: A proposal for a new locking strategy

Posted by Claude Warren <cl...@xenei.com>.

hmmmm... I'm missing something here.  I thought Datasets were ontop of
Models.  if Datasets are the fundamental object then I need to get an
implementation of permisisons on datasets.  Dang, and my plate is rather
full at the moment. :(

I thought the layers went

Dataset
Model
Graph

Claude

On Fri, Jan 8, 2016 at 5:08 PM, A. Soroka <aj...@virginia.edu> wrote:

> I’d like to try to kick this thread around a bit, hopefully in a helpful
> way. I’ve been thinking about an idea that Andy has tossed around a bit:
> that Models, Graphs, etc. could just be views on underlying Datasets. I’ve
> been trying to understand how this would work in connection with Claude’s
> ideas about fine-grained, possibly nested locking. If I understand both
> ideas correctly, it seems to me that the following result would hold in a
> Jena that takes up both:
>
> Locks on any kind of object other than an underlying Dataset would be
> “views” (so to speak) over locks on the underlying Dataset. This would
> ensure that locks made via a view are visible to other views and the
> underlying Dataset and that locks made on the underlying Dataset are
> visible to all views.
>
> Does that claim seem reasonable (especially to Andy and Claude)?
>
> ---
> A. Soroka
> The University of Virginia Library
>
> > On Jan 3, 2016, at 8:45 AM, Claude Warren <cl...@xenei.com> wrote:
> >
> > Andy,
> > I have not read trough the links in your last post yet but thought I
> might
> > answer the questions you posed as per my thoughts prior to reading those
> > posts. (concepts subject to change)
> >
> >
> >> As I understand it, the application is now involved to describe the part
> >> of the graph to lock. Is that right?
> >>
> >
> > That was my thought yes.  I was only looking at how to lock for write.  I
> > had not considered read locking, though I expected it would work like the
> > current lock does in the sense that read locks block write locks.
> Though I
> > suppose it might be possible in some situations to specify what the
> thread
> > was going to read.
> >
> >
> >>
> >> To check here - locks are held for a critical section (as current
> design)?
> >> Not a single API call? Something else?
> >>
> >> The permissions system is per API call but then permissions are not
> >> changing like data is on a live system.
> >>
> >>
> > I was thinking that they would be held for a critical section only as
> > currently done.
> >
> > I agree there are significant distinctions between the permissions and
> > locking as the permissions data does not change within a single call.
> >
> >
> >
> >>
> >> Does failure mean an exception or waiting?
> >>
> >>
> > My thought was failure/exception.  As you point out waiting leads to
> > deadlock issues.  However, the decision to wait or fail could be
> > implemented outside of the engine that determines if the lock can be
> > granted.  So it could be a plugable decision.  For the first cut I would
> > simply implement failure.
> >
> > But looking at the enterCriticalSection() code I see that it does not
> allow
> > failure, but that there is a mention of an error if a read lock is
> promoted
> > to a write lock.  I am unsure how to handle this case.  I think that a
> lock
> > failure exception may be needed.
> >
> > A possible solution is to delay and retry with a retry time out/limit
> > before throwing a timeout type exception.  This area needs work.
> >
> >
> >>
> >> These tests themselves have to be done by some synchronized code becaue
> >> there are multiple tests while other threads may be changing things at
> the
> >> same time.
> >>
> >>
> > agreed.  The internals of the locking code need synchronization.
> >
> >
> >> How far back are you going to wind back to? In a transaction system
> either
> >> the transaction is aborted (so all the way back); some system retry
> (i.e.
> >> rerun application code).
> >>
> >
> > I had assumed that the application would lock all the triples it needed
> at
> > the start of the lock.  this is supposed to be a small critical section.
> > However, I can see how nested class method calls might cause multiple
> locks.
> >
> > So lets assume code execution like:
> > {noformat}
> > {
> >    try {
> >        Lock( <foo ANY ANY> )
> >        // some processing here
> >        try {
> >            Lock( <bar ANY ANY >)
> >            // some processing here
> >        } catch (LockFailedException e)  {
> >            // handle "bar" lock failure
> >        } finally {
> >            Unlock( <bar ANY ANY > )
> >        }
> >    } catch (LockFailedException e ) {
> >        // handle "foo" lock failure
> >    } finally {
> >        Unlock( <foo, ANY ANY > )
> >   }
> >   // for grins ;)
> >   Unlock(); // release all held locks.
> > }
> >
> > {noformat}
> >
> > The partial solution is to require lock ordering - locks must be acquired
> >> in the same order and same class - and for a sequence of operations, not
> >> just a single access.
> >>
> >> The application is now involved - this is a specialized programming
> model
> >> for the ability to have multiple writers.
> >>
> >
> > If we consider the lock request to be a set of patterns then then the
> lock
> > itself holds a set of patterns.  For example Lock( <foo ANY ANY> <bar ANY
> > ANY ) would create a lock holding the fooAny and barAny triples.
> > Subsequently calling Lock( <baz, ANY ANY> ) would result in the set: <foo
> > ANY ANY> <bar ANY ANY> <baz ANY ANY>.  calling Unlock( <bar ANY ANY> )
> > would result in the set: <foo ANY ANY> <bar ANY ANY>.
> >
> > So locks for a single thread do not need to be released in the order they
> > were acquired.  But all the locks must be released when the critical
> > section exits.
> >
> >
> >
> >> It would be a good thing to have as an additional choice of graph
> (ideally
> >> dataset) implementation.  A multi-writer dataset, then just make graph
> >> views of the dataset, would great to have.
> >>
> >> Is it easy to build an experimental version using the permissions code?
> >>
> >>        Andy
> >>
> >>
> > I'm not sure how much of the permissions code would be applicable.  It
> uses
> > dynamic proxies to intercept graph and model (and assorted helper
> classes)
> > method calls.
> >
> > I think the locking engine is more likely like a query engine
> > implementation having to do pattern matching on sets of triples.
> >
> > I think that the starting place may be to extend the Lock interface to
> > include:
> > enterCriticalSection( triple .... );  // establish a lock for the triples
> > Graph getLockPattern(); // get the lock pattern
> > void lock( triple ... ); // add to lock pattern
> > void unlock( triple ...); // remove from lock pattern
> > void unlock(); // remove all locks
> >
> > It might make more sense to put the last 4 into a separate interface and
> > have a method to return an instance of that interface.  Something like:
> >
> > LockHoldings getLocks();
> >
> >
> > It seems to me the complex part is going to be determining if any thread
> > already has the a lock on the object from the patterns.  I can dig into
> > that to see if I can get an early prototype of lock tracking.
> >
> > Claude
> >
> > --
> > I like: Like Like - The likeliest place on the web
> > <http://like-like.xenei.com>
> > LinkedIn: http://www.linkedin.com/in/claudewarren
>
>


-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: A proposal for a new locking strategy

Posted by "A. Soroka" <aj...@virginia.edu>.

I’d like to try to kick this thread around a bit, hopefully in a helpful way. I’ve been thinking about an idea that Andy has tossed around a bit: that Models, Graphs, etc. could just be views on underlying Datasets. I’ve been trying to understand how this would work in connection with Claude’s ideas about fine-grained, possibly nested locking. If I understand both ideas correctly, it seems to me that the following result would hold in a Jena that takes up both:

Locks on any kind of object other than an underlying Dataset would be “views” (so to speak) over locks on the underlying Dataset. This would ensure that locks made via a view are visible to other views and the underlying Dataset and that locks made on the underlying Dataset are visible to all views.

Does that claim seem reasonable (especially to Andy and Claude)?

---
A. Soroka
The University of Virginia Library

> On Jan 3, 2016, at 8:45 AM, Claude Warren <cl...@xenei.com> wrote:
> 
> Andy,
> I have not read trough the links in your last post yet but thought I might
> answer the questions you posed as per my thoughts prior to reading those
> posts. (concepts subject to change)
> 
> 
>> As I understand it, the application is now involved to describe the part
>> of the graph to lock. Is that right?
>> 
> 
> That was my thought yes.  I was only looking at how to lock for write.  I
> had not considered read locking, though I expected it would work like the
> current lock does in the sense that read locks block write locks.  Though I
> suppose it might be possible in some situations to specify what the thread
> was going to read.
> 
> 
>> 
>> To check here - locks are held for a critical section (as current design)?
>> Not a single API call? Something else?
>> 
>> The permissions system is per API call but then permissions are not
>> changing like data is on a live system.
>> 
>> 
> I was thinking that they would be held for a critical section only as
> currently done.
> 
> I agree there are significant distinctions between the permissions and
> locking as the permissions data does not change within a single call.
> 
> 
> 
>> 
>> Does failure mean an exception or waiting?
>> 
>> 
> My thought was failure/exception.  As you point out waiting leads to
> deadlock issues.  However, the decision to wait or fail could be
> implemented outside of the engine that determines if the lock can be
> granted.  So it could be a plugable decision.  For the first cut I would
> simply implement failure.
> 
> But looking at the enterCriticalSection() code I see that it does not allow
> failure, but that there is a mention of an error if a read lock is promoted
> to a write lock.  I am unsure how to handle this case.  I think that a lock
> failure exception may be needed.
> 
> A possible solution is to delay and retry with a retry time out/limit
> before throwing a timeout type exception.  This area needs work.
> 
> 
>> 
>> These tests themselves have to be done by some synchronized code becaue
>> there are multiple tests while other threads may be changing things at the
>> same time.
>> 
>> 
> agreed.  The internals of the locking code need synchronization.
> 
> 
>> How far back are you going to wind back to? In a transaction system either
>> the transaction is aborted (so all the way back); some system retry (i.e.
>> rerun application code).
>> 
> 
> I had assumed that the application would lock all the triples it needed at
> the start of the lock.  this is supposed to be a small critical section.
> However, I can see how nested class method calls might cause multiple locks.
> 
> So lets assume code execution like:
> {noformat}
> {
>    try {
>        Lock( <foo ANY ANY> )
>        // some processing here
>        try {
>            Lock( <bar ANY ANY >)
>            // some processing here
>        } catch (LockFailedException e)  {
>            // handle "bar" lock failure
>        } finally {
>            Unlock( <bar ANY ANY > )
>        }
>    } catch (LockFailedException e ) {
>        // handle "foo" lock failure
>    } finally {
>        Unlock( <foo, ANY ANY > )
>   }
>   // for grins ;)
>   Unlock(); // release all held locks.
> }
> 
> {noformat}
> 
> The partial solution is to require lock ordering - locks must be acquired
>> in the same order and same class - and for a sequence of operations, not
>> just a single access.
>> 
>> The application is now involved - this is a specialized programming model
>> for the ability to have multiple writers.
>> 
> 
> If we consider the lock request to be a set of patterns then then the lock
> itself holds a set of patterns.  For example Lock( <foo ANY ANY> <bar ANY
> ANY ) would create a lock holding the fooAny and barAny triples.
> Subsequently calling Lock( <baz, ANY ANY> ) would result in the set: <foo
> ANY ANY> <bar ANY ANY> <baz ANY ANY>.  calling Unlock( <bar ANY ANY> )
> would result in the set: <foo ANY ANY> <bar ANY ANY>.
> 
> So locks for a single thread do not need to be released in the order they
> were acquired.  But all the locks must be released when the critical
> section exits.
> 
> 
> 
>> It would be a good thing to have as an additional choice of graph (ideally
>> dataset) implementation.  A multi-writer dataset, then just make graph
>> views of the dataset, would great to have.
>> 
>> Is it easy to build an experimental version using the permissions code?
>> 
>>        Andy
>> 
>> 
> I'm not sure how much of the permissions code would be applicable.  It uses
> dynamic proxies to intercept graph and model (and assorted helper classes)
> method calls.
> 
> I think the locking engine is more likely like a query engine
> implementation having to do pattern matching on sets of triples.
> 
> I think that the starting place may be to extend the Lock interface to
> include:
> enterCriticalSection( triple .... );  // establish a lock for the triples
> Graph getLockPattern(); // get the lock pattern
> void lock( triple ... ); // add to lock pattern
> void unlock( triple ...); // remove from lock pattern
> void unlock(); // remove all locks
> 
> It might make more sense to put the last 4 into a separate interface and
> have a method to return an instance of that interface.  Something like:
> 
> LockHoldings getLocks();
> 
> 
> It seems to me the complex part is going to be determining if any thread
> already has the a lock on the object from the patterns.  I can dig into
> that to see if I can get an early prototype of lock tracking.
> 
> Claude
> 
> -- 
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren

Re: A proposal for a new locking strategy

Posted by Claude Warren <cl...@xenei.com>.

Andy,
I have not read trough the links in your last post yet but thought I might
answer the questions you posed as per my thoughts prior to reading those
posts. (concepts subject to change)


> As I understand it, the application is now involved to describe the part
> of the graph to lock. Is that right?
>

That was my thought yes.  I was only looking at how to lock for write.  I
had not considered read locking, though I expected it would work like the
current lock does in the sense that read locks block write locks.  Though I
suppose it might be possible in some situations to specify what the thread
was going to read.


>
> To check here - locks are held for a critical section (as current design)?
> Not a single API call? Something else?
>
> The permissions system is per API call but then permissions are not
> changing like data is on a live system.
>
>
I was thinking that they would be held for a critical section only as
currently done.

I agree there are significant distinctions between the permissions and
locking as the permissions data does not change within a single call.



>
> Does failure mean an exception or waiting?
>
>
My thought was failure/exception.  As you point out waiting leads to
deadlock issues.  However, the decision to wait or fail could be
implemented outside of the engine that determines if the lock can be
granted.  So it could be a plugable decision.  For the first cut I would
simply implement failure.

But looking at the enterCriticalSection() code I see that it does not allow
failure, but that there is a mention of an error if a read lock is promoted
to a write lock.  I am unsure how to handle this case.  I think that a lock
failure exception may be needed.

A possible solution is to delay and retry with a retry time out/limit
before throwing a timeout type exception.  This area needs work.


>
> These tests themselves have to be done by some synchronized code becaue
> there are multiple tests while other threads may be changing things at the
> same time.
>
>
agreed.  The internals of the locking code need synchronization.


> How far back are you going to wind back to? In a transaction system either
> the transaction is aborted (so all the way back); some system retry (i.e.
> rerun application code).
>

I had assumed that the application would lock all the triples it needed at
the start of the lock.  this is supposed to be a small critical section.
However, I can see how nested class method calls might cause multiple locks.

So lets assume code execution like:
{noformat}
{
    try {
        Lock( <foo ANY ANY> )
        // some processing here
        try {
            Lock( <bar ANY ANY >)
            // some processing here
        } catch (LockFailedException e)  {
            // handle "bar" lock failure
        } finally {
            Unlock( <bar ANY ANY > )
        }
    } catch (LockFailedException e ) {
        // handle "foo" lock failure
    } finally {
        Unlock( <foo, ANY ANY > )
   }
   // for grins ;)
   Unlock(); // release all held locks.
 }

{noformat}

The partial solution is to require lock ordering - locks must be acquired
> in the same order and same class - and for a sequence of operations, not
> just a single access.
>
> The application is now involved - this is a specialized programming model
> for the ability to have multiple writers.
>

If we consider the lock request to be a set of patterns then then the lock
itself holds a set of patterns.  For example Lock( <foo ANY ANY> <bar ANY
ANY ) would create a lock holding the fooAny and barAny triples.
Subsequently calling Lock( <baz, ANY ANY> ) would result in the set: <foo
ANY ANY> <bar ANY ANY> <baz ANY ANY>.  calling Unlock( <bar ANY ANY> )
would result in the set: <foo ANY ANY> <bar ANY ANY>.

So locks for a single thread do not need to be released in the order they
were acquired.  But all the locks must be released when the critical
section exits.



> It would be a good thing to have as an additional choice of graph (ideally
> dataset) implementation.  A multi-writer dataset, then just make graph
> views of the dataset, would great to have.
>
> Is it easy to build an experimental version using the permissions code?
>
>         Andy
>
>
I'm not sure how much of the permissions code would be applicable.  It uses
dynamic proxies to intercept graph and model (and assorted helper classes)
method calls.

I think the locking engine is more likely like a query engine
implementation having to do pattern matching on sets of triples.

I think that the starting place may be to extend the Lock interface to
include:
enterCriticalSection( triple .... );  // establish a lock for the triples
Graph getLockPattern(); // get the lock pattern
void lock( triple ... ); // add to lock pattern
void unlock( triple ...); // remove from lock pattern
void unlock(); // remove all locks

It might make more sense to put the last 4 into a separate interface and
have a method to return an instance of that interface.  Something like:

LockHoldings getLocks();


It seems to me the complex part is going to be determining if any thread
already has the a lock on the object from the patterns.  I can dig into
that to see if I can get an early prototype of lock tracking.

Claude

-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: A proposal for a new locking strategy

Posted by Andy Seaborne <an...@apache.org>.

On the specifics of the scheme and locking things in general.

There is a nice picture in in:

http://sqlperformance.com/2014/04/t-sql-queries/the-read-committed-isolation-level

where the scan returns a view of the data that never actually exists. It 
misses one row and sees another twice.

Consistency matters. As a locking scheme this seems to be closely 
related to 2PL:

https://en.wikipedia.org/wiki/Two-phase_locking

The Triple level locking is sort of like read-committed. All the 
literature on read-committed and other inconsistency applies.

https://en.wikipedia.org/wiki/Isolation_%28database_systems%29#Non-repeatable_reads

One thing I'm unclear about is what to lock: does the application say 
what to lock or does the system lock a needed?

An example: all triples with a subject in one namespace:

{ ?s <p> ?o .
   FILTER regex (str(?s), "^http://example/namespace#")
}

to find smallest set of triples to lock is executing the pattern itself.

Negation makes things even more interesting.

Some operations combine information from different parts of the graph so 
taking locks out as execution progresses leads to problems of 
consistency for aggregations like COUNT or SUM.

Locking on the whole data avoids this.

On 02/01/16 14:18, Claude Warren wrote:
> Currently most Jena implementations use a multiple read one write
> solution.

Transactions don't use those locks.

In transactions, locking isn't even touched There is a cheap imperfect 
Transactional based on locks but it can't do abort.  Locking on it's own 
isn't transactions.

Transactional systems like TxnMem and TDB do a form of whole-database 
isolation and give serializable (technical sense) semantics for SPARQL.

> However, I think that it is possible (with minimal work) do
> provide a solution that would allow for multiple writers by using lower
> level locks.

As I understand it, the application is now involved to describe the part 
of the graph to lock. Is that right?

>
> I take inspiration from the Privileges code.  That code allows privileges
> to be determined down to the triple level.  Basically it does the following
> {noformat}
> start
>   |
>   v
> may user perform operation on graph? → (no) (restrict)
>   |
>   v
> (yes)
> may user perform operation on any triple in graph → (yes) (allow)
>   |
>   v
> (no)
> may user perform operation on the specific triple in graph → (yes) (allow)
>   |
>   v
> (no) (restrict)
> {noformat}
>
> My thought is that the locking may work much the same way.  Once one thread
> has the objects locked the any other thread may not lock the object.  The
> process would be something like:

To check here - locks are held for a critical section (as current 
design)? Not a single API call? Something else?

The permissions system is per API call but then permissions are not 
changing like data is on a live system.

>
> Graph locking would require exclusive lock or non-exclusive lock.  If the
> entire graph were to be locked for writing (as in the current system) then
> the request would be for an exclusive write-lock on the graph.  Once an
> exclusive write lock has been established no other write lock may be
> applied to the graph or any of its triples by any other thread.
>
> If a thread only wanted to lock part of the graph, for example all triples
> matching <u:foo ANY ANY>, the thread would first acquire a non-exclusive
> write lock on the graph.  It would then acquire an exclusive write lock on
> all triples matching <u:foo ANY ANY>.  Once that triple match lock was
> acquired no other thread would be able to lock any triple who's subject was
> u:foo.
>
> The lock request would need to contain the graph name and (in the case of a
> partial graph lock) a set of triple patterns to lock.  The flow for the
> lock would be something like:
>
> {noformat}
> start
>   |
>   v
> does the thread hold an exclusive graph lock → (yes) (success)
>   |
>   v
> (no)
> does the thread want an exclusive graph lock → (yes) (go to ex graph lock)
>   |
>   v
> (no)
> does the thread hold a non-exclusive graph lock → (no) (go to nonex graph
> lock)
>   |
>   v
> (yes) (lbl:lock acquired)
> can the thread acquire all the triple locks  → (yes) (success)
>   |
>   v
> (no) (failure)

Does failure mean an exception or waiting?

If it's an exception, what happens from the app point of view?

> (lbl: nonex graph lock)
> does any thread hold an exclusive graph lock → (yes) (failure)
>   |
>   v
> (no)
> acquire non-exclusive graph lock
> (goto lock acquired)
>
>
> (lbl: ex graph lock)
> does any thread hold an exclusive graph lock → (yes) (failure)
>   |
>   v
> (no)
> does any thread hold a non-exclusive graph lock → (yes) (failure)
>   |
>   v
> (no)
> acquire exclusive graph lock
> (success)

These tests themselves have to be done by some synchronized code becaue 
there are multiple tests while other threads may be changing things at 
the same time.

>
> {noformat}

If waiting, rather than aborting, then deadlock is possible. What's the 
plan for that?

(exclusive locks on triples)
Thread 1:
Lock triple A
... some app code ... if() involved ...
... decides it needs to ...
Lock triple B

whereas

Thread 2:
Lock triple B
... some app code ... if() involved ...
... decides it needs to ...
Lock triple A

and that happens in wall-clock time:

Thread 1:            Thread 2:
Lock triple A
                      Lock triple B
Lock triple B
                      Lock triple A

Thread 1 and 2 now wait forever.

How far back are you going to wind back to? In a transaction system 
either the transaction is aborted (so all the way back); some system 
retry (i.e. rerun application code).

The partial solution is to require lock ordering - locks must be 
acquired in the same order and same class - and for a sequence of 
operations, not just a single access.

The application is now involved - this is a specialized programming 
model for the ability to have multiple writers.

> The permissions system uses an abstract engine to determine if the user has
> access to the triples.  For the locking mechanism the system needs to track
> graph locks and triple patterns locked.  If a new request for a triple
> pattern matches any existing (already locked) pattern the lock request
> fails.
>
> The simple releaseLock() will release all locks the thread holds.
>
> Note that the locking system does not check the graph being locked to see
> if the items exist in the graph it is simply tracking patterns of locks and
> determining if there are any conflicts between the patterns.
>
> Because this process can duplicate the current locking strategy it can be
> used as a drop in replacement in the current code.  So current code would
> continue to operate as it does currently but future development could be
> more sensitive to locking named graphs, and partial updates to provide
> multi-thread updates.
>
> Thoughts?

It would be a good thing to have as an additional choice of graph 
(ideally dataset) implementation.  A multi-writer dataset, then just 
make graph views of the dataset, would great to have.

Is it easy to build an experimental version using the permissions code?

> Claude
>

	Andy

Re: A proposal for a new locking strategy

Posted by Andy Seaborne <an...@apache.org>.

On 02/01/16 20:13, Paul Houle wrote:
> I'd love to see RDF* and SPARQL* support in Jena but that might be too much
> to ask.

Submit a patch.  Better build an exprimental version of Jena - it is 
open source

The << >> syntax is taken from early drafts of SPARQL 1.0. Remnants are 
still master grammar file, "#if 0" commented out.

(actually, it's more complicated than that :-) as the form of 
one-reification it uses requires some uniqueness of the reification ... 
mere details)

	Andy

>
> On Sat, Jan 2, 2016 at 3:09 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> On 02/01/16 19:36, Paul Houle wrote:
>>
>>> :s [] [] is a lot like a relational entity,  but I think the really
>>> interesting thing about the RDF model is the ability to create
>>> "post-relational" structures,  even if it does involve blank nodes.  The
>>> future is more like JSON-LD or the nested columnar model.
>>>
>>> In that context an entity can be a little bit more than just :s [] [] but
>>> could involve a hierarchical structure or ordered lists.  In the case of
>>> Freebase,  for instance,  you have the "mediator" or "CVT" nodes which
>>> form
>>> a bipartite graph with respect to entity nodes so it is a straightforward
>>> operation to cut out an entity and the CVTs around it.
>>>
>>> Lately I've been working on a framework which is a bit like the "boxes and
>>> line" products like Alteryx, KNIME, Actian -- those products are a dime a
>>> dozen but they are all based on a tabular data model and this one is
>>> passing small RDF graphs around,  so it supports the nested columnar
>>> model,
>>>    logic,  etc.  Pipelines like that rapidly become unintuitive and
>>> structurally unstable when joins get involved,  particular when they
>>> involve "parts" of something that is a clear conceptual entity.
>>>
>>> Obviously this thing is configured by an RDF graph,  because the point is
>>> not that you draw a data processing pipeline but that one of these data
>>> processing pipelines consumes schema information and a theory library to
>>> build a graph that describes what will be done to the instances.
>>>
>>> So there is a MetaFactory that picks apart the graph into subgraphs,
>>> feeds
>>> the subgraphs into the processing modules and then hooks them up in a
>>> communications fabric.
>>>
>>> I don't yet have a single strategy for doing the "document extraction" but
>>> I have two or three methods that between them seem to cover the cases that
>>> actually come up.
>>>
>>> Following this line,  it would be nice to be able to lock a whole
>>> structure
>>> that looks like
>>>
>>> [
>>>      a :Paper ;
>>>     :authors ("Alpher","Bethe","Gamow") ;
>>>     :publication [ :journal :PhysicalReview ; :year 1948 . ]
>>> ]
>>>
>>> I don't know how implementable such a thing is,  but the problem of
>>> drawing
>>> a line around a complex entity would be part of it.
>>>
>>
>> I have always thought that we need a type of property that expresses
>> "contains", or is part of an entity description, as well as datatype
>> properties for relationships between top-level entities.  They are a sort
>> of generalization of object properties.
>>
>> Or maybe a richer set of literals to include maps and proper lists. c.f.
>> Property graphs.
>>
>>          Andy
>>
>>
>>
>>> On Sat, Jan 2, 2016 at 1:08 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>> An SQL database row is a entity in the application data model. If you
>>>> model a person, you have one row, but in RDF you have several triples.
>>>> Triple level locking is analogous to cell level locking in SQL databases.
>>>>
>>>>           Andy
>>>>
>>>>
>>>> On 02/01/16 17:01, Paul Houle wrote:
>>>>
>>>> I think it is a worthwhile idea.  Given that you are still having to get
>>>>> a
>>>>> global lock to get a triple lock,  isn't there still a scaling limit on
>>>>> the
>>>>> global lock?
>>>>>
>>>>> I think a lot about the things that made the relational database
>>>>> approach
>>>>> so successful and certainly one thing is that row-level locking
>>>>> corresponds
>>>>> well to real-life access patterns.
>>>>>
>>>>> On Sat, Jan 2, 2016 at 9:18 AM, Claude Warren <cl...@xenei.com> wrote:
>>>>>
>>>>> Currently most Jena implementations use a multiple read one write
>>>>>
>>>>>> solution.  However, I think that it is possible (with minimal work) do
>>>>>> provide a solution that would allow for multiple writers by using lower
>>>>>> level locks.
>>>>>>
>>>>>> I take inspiration from the Privileges code.  That code allows
>>>>>> privileges
>>>>>> to be determined down to the triple level.  Basically it does the
>>>>>> following
>>>>>> {noformat}
>>>>>> start
>>>>>>     |
>>>>>>     v
>>>>>> may user perform operation on graph? → (no) (restrict)
>>>>>>     |
>>>>>>     v
>>>>>> (yes)
>>>>>> may user perform operation on any triple in graph → (yes) (allow)
>>>>>>     |
>>>>>>     v
>>>>>> (no)
>>>>>> may user perform operation on the specific triple in graph → (yes)
>>>>>> (allow)
>>>>>>     |
>>>>>>     v
>>>>>> (no) (restrict)
>>>>>> {noformat}
>>>>>>
>>>>>> My thought is that the locking may work much the same way.  Once one
>>>>>> thread
>>>>>> has the objects locked the any other thread may not lock the object.
>>>>>> The
>>>>>> process would be something like:
>>>>>>
>>>>>> Graph locking would require exclusive lock or non-exclusive lock.  If
>>>>>> the
>>>>>> entire graph were to be locked for writing (as in the current system)
>>>>>> then
>>>>>> the request would be for an exclusive write-lock on the graph.  Once an
>>>>>> exclusive write lock has been established no other write lock may be
>>>>>> applied to the graph or any of its triples by any other thread.
>>>>>>
>>>>>> If a thread only wanted to lock part of the graph, for example all
>>>>>> triples
>>>>>> matching <u:foo ANY ANY>, the thread would first acquire a
>>>>>> non-exclusive
>>>>>> write lock on the graph.  It would then acquire an exclusive write lock
>>>>>> on
>>>>>> all triples matching <u:foo ANY ANY>.  Once that triple match lock was
>>>>>> acquired no other thread would be able to lock any triple who's subject
>>>>>> was
>>>>>> u:foo.
>>>>>>
>>>>>> The lock request would need to contain the graph name and (in the case
>>>>>> of a
>>>>>> partial graph lock) a set of triple patterns to lock.  The flow for the
>>>>>> lock would be something like:
>>>>>>
>>>>>> {noformat}
>>>>>> start
>>>>>>     |
>>>>>>     v
>>>>>> does the thread hold an exclusive graph lock → (yes) (success)
>>>>>>     |
>>>>>>     v
>>>>>> (no)
>>>>>> does the thread want an exclusive graph lock → (yes) (go to ex graph
>>>>>> lock)
>>>>>>     |
>>>>>>     v
>>>>>> (no)
>>>>>> does the thread hold a non-exclusive graph lock → (no) (go to nonex
>>>>>> graph
>>>>>> lock)
>>>>>>     |
>>>>>>     v
>>>>>> (yes) (lbl:lock acquired)
>>>>>> can the thread acquire all the triple locks  → (yes) (success)
>>>>>>     |
>>>>>>     v
>>>>>> (no) (failure)
>>>>>>
>>>>>>
>>>>>> (lbl: nonex graph lock)
>>>>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>>>>     |
>>>>>>     v
>>>>>> (no)
>>>>>> acquire non-exclusive graph lock
>>>>>> (goto lock acquired)
>>>>>>
>>>>>>
>>>>>> (lbl: ex graph lock)
>>>>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>>>>     |
>>>>>>     v
>>>>>> (no)
>>>>>> does any thread hold a non-exclusive graph lock → (yes) (failure)
>>>>>>     |
>>>>>>     v
>>>>>> (no)
>>>>>> acquire exclusive graph lock
>>>>>> (success)
>>>>>>
>>>>>> {noformat}
>>>>>>
>>>>>> The permissions system uses an abstract engine to determine if the user
>>>>>> has
>>>>>> access to the triples.  For the locking mechanism the system needs to
>>>>>> track
>>>>>> graph locks and triple patterns locked.  If a new request for a triple
>>>>>> pattern matches any existing (already locked) pattern the lock request
>>>>>> fails.
>>>>>>
>>>>>> The simple releaseLock() will release all locks the thread holds.
>>>>>>
>>>>>> Note that the locking system does not check the graph being locked to
>>>>>> see
>>>>>> if the items exist in the graph it is simply tracking patterns of locks
>>>>>> and
>>>>>> determining if there are any conflicts between the patterns.
>>>>>>
>>>>>> Because this process can duplicate the current locking strategy it can
>>>>>> be
>>>>>> used as a drop in replacement in the current code.  So current code
>>>>>> would
>>>>>> continue to operate as it does currently but future development could
>>>>>> be
>>>>>> more sensitive to locking named graphs, and partial updates to provide
>>>>>> multi-thread updates.
>>>>>>
>>>>>> Thoughts?
>>>>>> Claude
>>>>>>
>>>>>> --
>>>>>> I like: Like Like - The likeliest place on the web
>>>>>> <http://like-like.xenei.com>
>>>>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>

Re: A proposal for a new locking strategy

Posted by Paul Houle <on...@gmail.com>.

I'd love to see RDF* and SPARQL* support in Jena but that might be too much
to ask.

On Sat, Jan 2, 2016 at 3:09 PM, Andy Seaborne <an...@apache.org> wrote:

> On 02/01/16 19:36, Paul Houle wrote:
>
>> :s [] [] is a lot like a relational entity,  but I think the really
>> interesting thing about the RDF model is the ability to create
>> "post-relational" structures,  even if it does involve blank nodes.  The
>> future is more like JSON-LD or the nested columnar model.
>>
>> In that context an entity can be a little bit more than just :s [] [] but
>> could involve a hierarchical structure or ordered lists.  In the case of
>> Freebase,  for instance,  you have the "mediator" or "CVT" nodes which
>> form
>> a bipartite graph with respect to entity nodes so it is a straightforward
>> operation to cut out an entity and the CVTs around it.
>>
>> Lately I've been working on a framework which is a bit like the "boxes and
>> line" products like Alteryx, KNIME, Actian -- those products are a dime a
>> dozen but they are all based on a tabular data model and this one is
>> passing small RDF graphs around,  so it supports the nested columnar
>> model,
>>   logic,  etc.  Pipelines like that rapidly become unintuitive and
>> structurally unstable when joins get involved,  particular when they
>> involve "parts" of something that is a clear conceptual entity.
>>
>> Obviously this thing is configured by an RDF graph,  because the point is
>> not that you draw a data processing pipeline but that one of these data
>> processing pipelines consumes schema information and a theory library to
>> build a graph that describes what will be done to the instances.
>>
>> So there is a MetaFactory that picks apart the graph into subgraphs,
>> feeds
>> the subgraphs into the processing modules and then hooks them up in a
>> communications fabric.
>>
>> I don't yet have a single strategy for doing the "document extraction" but
>> I have two or three methods that between them seem to cover the cases that
>> actually come up.
>>
>> Following this line,  it would be nice to be able to lock a whole
>> structure
>> that looks like
>>
>> [
>>     a :Paper ;
>>    :authors ("Alpher","Bethe","Gamow") ;
>>    :publication [ :journal :PhysicalReview ; :year 1948 . ]
>> ]
>>
>> I don't know how implementable such a thing is,  but the problem of
>> drawing
>> a line around a complex entity would be part of it.
>>
>
> I have always thought that we need a type of property that expresses
> "contains", or is part of an entity description, as well as datatype
> properties for relationships between top-level entities.  They are a sort
> of generalization of object properties.
>
> Or maybe a richer set of literals to include maps and proper lists. c.f.
> Property graphs.
>
>         Andy
>
>
>
>> On Sat, Jan 2, 2016 at 1:08 PM, Andy Seaborne <an...@apache.org> wrote:
>>
>> An SQL database row is a entity in the application data model. If you
>>> model a person, you have one row, but in RDF you have several triples.
>>> Triple level locking is analogous to cell level locking in SQL databases.
>>>
>>>          Andy
>>>
>>>
>>> On 02/01/16 17:01, Paul Houle wrote:
>>>
>>> I think it is a worthwhile idea.  Given that you are still having to get
>>>> a
>>>> global lock to get a triple lock,  isn't there still a scaling limit on
>>>> the
>>>> global lock?
>>>>
>>>> I think a lot about the things that made the relational database
>>>> approach
>>>> so successful and certainly one thing is that row-level locking
>>>> corresponds
>>>> well to real-life access patterns.
>>>>
>>>> On Sat, Jan 2, 2016 at 9:18 AM, Claude Warren <cl...@xenei.com> wrote:
>>>>
>>>> Currently most Jena implementations use a multiple read one write
>>>>
>>>>> solution.  However, I think that it is possible (with minimal work) do
>>>>> provide a solution that would allow for multiple writers by using lower
>>>>> level locks.
>>>>>
>>>>> I take inspiration from the Privileges code.  That code allows
>>>>> privileges
>>>>> to be determined down to the triple level.  Basically it does the
>>>>> following
>>>>> {noformat}
>>>>> start
>>>>>    |
>>>>>    v
>>>>> may user perform operation on graph? → (no) (restrict)
>>>>>    |
>>>>>    v
>>>>> (yes)
>>>>> may user perform operation on any triple in graph → (yes) (allow)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> may user perform operation on the specific triple in graph → (yes)
>>>>> (allow)
>>>>>    |
>>>>>    v
>>>>> (no) (restrict)
>>>>> {noformat}
>>>>>
>>>>> My thought is that the locking may work much the same way.  Once one
>>>>> thread
>>>>> has the objects locked the any other thread may not lock the object.
>>>>> The
>>>>> process would be something like:
>>>>>
>>>>> Graph locking would require exclusive lock or non-exclusive lock.  If
>>>>> the
>>>>> entire graph were to be locked for writing (as in the current system)
>>>>> then
>>>>> the request would be for an exclusive write-lock on the graph.  Once an
>>>>> exclusive write lock has been established no other write lock may be
>>>>> applied to the graph or any of its triples by any other thread.
>>>>>
>>>>> If a thread only wanted to lock part of the graph, for example all
>>>>> triples
>>>>> matching <u:foo ANY ANY>, the thread would first acquire a
>>>>> non-exclusive
>>>>> write lock on the graph.  It would then acquire an exclusive write lock
>>>>> on
>>>>> all triples matching <u:foo ANY ANY>.  Once that triple match lock was
>>>>> acquired no other thread would be able to lock any triple who's subject
>>>>> was
>>>>> u:foo.
>>>>>
>>>>> The lock request would need to contain the graph name and (in the case
>>>>> of a
>>>>> partial graph lock) a set of triple patterns to lock.  The flow for the
>>>>> lock would be something like:
>>>>>
>>>>> {noformat}
>>>>> start
>>>>>    |
>>>>>    v
>>>>> does the thread hold an exclusive graph lock → (yes) (success)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> does the thread want an exclusive graph lock → (yes) (go to ex graph
>>>>> lock)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> does the thread hold a non-exclusive graph lock → (no) (go to nonex
>>>>> graph
>>>>> lock)
>>>>>    |
>>>>>    v
>>>>> (yes) (lbl:lock acquired)
>>>>> can the thread acquire all the triple locks  → (yes) (success)
>>>>>    |
>>>>>    v
>>>>> (no) (failure)
>>>>>
>>>>>
>>>>> (lbl: nonex graph lock)
>>>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> acquire non-exclusive graph lock
>>>>> (goto lock acquired)
>>>>>
>>>>>
>>>>> (lbl: ex graph lock)
>>>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> does any thread hold a non-exclusive graph lock → (yes) (failure)
>>>>>    |
>>>>>    v
>>>>> (no)
>>>>> acquire exclusive graph lock
>>>>> (success)
>>>>>
>>>>> {noformat}
>>>>>
>>>>> The permissions system uses an abstract engine to determine if the user
>>>>> has
>>>>> access to the triples.  For the locking mechanism the system needs to
>>>>> track
>>>>> graph locks and triple patterns locked.  If a new request for a triple
>>>>> pattern matches any existing (already locked) pattern the lock request
>>>>> fails.
>>>>>
>>>>> The simple releaseLock() will release all locks the thread holds.
>>>>>
>>>>> Note that the locking system does not check the graph being locked to
>>>>> see
>>>>> if the items exist in the graph it is simply tracking patterns of locks
>>>>> and
>>>>> determining if there are any conflicts between the patterns.
>>>>>
>>>>> Because this process can duplicate the current locking strategy it can
>>>>> be
>>>>> used as a drop in replacement in the current code.  So current code
>>>>> would
>>>>> continue to operate as it does currently but future development could
>>>>> be
>>>>> more sensitive to locking named graphs, and partial updates to provide
>>>>> multi-thread updates.
>>>>>
>>>>> Thoughts?
>>>>> Claude
>>>>>
>>>>> --
>>>>> I like: Like Like - The likeliest place on the web
>>>>> <http://like-like.xenei.com>
>>>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>


-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   ontology2@gmail.com

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275

Re: A proposal for a new locking strategy

Posted by Andy Seaborne <an...@apache.org>.

On 02/01/16 19:36, Paul Houle wrote:
> :s [] [] is a lot like a relational entity,  but I think the really
> interesting thing about the RDF model is the ability to create
> "post-relational" structures,  even if it does involve blank nodes.  The
> future is more like JSON-LD or the nested columnar model.
>
> In that context an entity can be a little bit more than just :s [] [] but
> could involve a hierarchical structure or ordered lists.  In the case of
> Freebase,  for instance,  you have the "mediator" or "CVT" nodes which form
> a bipartite graph with respect to entity nodes so it is a straightforward
> operation to cut out an entity and the CVTs around it.
>
> Lately I've been working on a framework which is a bit like the "boxes and
> line" products like Alteryx, KNIME, Actian -- those products are a dime a
> dozen but they are all based on a tabular data model and this one is
> passing small RDF graphs around,  so it supports the nested columnar model,
>   logic,  etc.  Pipelines like that rapidly become unintuitive and
> structurally unstable when joins get involved,  particular when they
> involve "parts" of something that is a clear conceptual entity.
>
> Obviously this thing is configured by an RDF graph,  because the point is
> not that you draw a data processing pipeline but that one of these data
> processing pipelines consumes schema information and a theory library to
> build a graph that describes what will be done to the instances.
>
> So there is a MetaFactory that picks apart the graph into subgraphs,  feeds
> the subgraphs into the processing modules and then hooks them up in a
> communications fabric.
>
> I don't yet have a single strategy for doing the "document extraction" but
> I have two or three methods that between them seem to cover the cases that
> actually come up.
>
> Following this line,  it would be nice to be able to lock a whole structure
> that looks like
>
> [
>     a :Paper ;
>    :authors ("Alpher","Bethe","Gamow") ;
>    :publication [ :journal :PhysicalReview ; :year 1948 . ]
> ]
>
> I don't know how implementable such a thing is,  but the problem of drawing
> a line around a complex entity would be part of it.

I have always thought that we need a type of property that expresses 
"contains", or is part of an entity description, as well as datatype 
properties for relationships between top-level entities.  They are a 
sort of generalization of object properties.

Or maybe a richer set of literals to include maps and proper lists. 
c.f. Property graphs.

	Andy

>
> On Sat, Jan 2, 2016 at 1:08 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> An SQL database row is a entity in the application data model. If you
>> model a person, you have one row, but in RDF you have several triples.
>> Triple level locking is analogous to cell level locking in SQL databases.
>>
>>          Andy
>>
>>
>> On 02/01/16 17:01, Paul Houle wrote:
>>
>>> I think it is a worthwhile idea.  Given that you are still having to get a
>>> global lock to get a triple lock,  isn't there still a scaling limit on
>>> the
>>> global lock?
>>>
>>> I think a lot about the things that made the relational database approach
>>> so successful and certainly one thing is that row-level locking
>>> corresponds
>>> well to real-life access patterns.
>>>
>>> On Sat, Jan 2, 2016 at 9:18 AM, Claude Warren <cl...@xenei.com> wrote:
>>>
>>> Currently most Jena implementations use a multiple read one write
>>>> solution.  However, I think that it is possible (with minimal work) do
>>>> provide a solution that would allow for multiple writers by using lower
>>>> level locks.
>>>>
>>>> I take inspiration from the Privileges code.  That code allows privileges
>>>> to be determined down to the triple level.  Basically it does the
>>>> following
>>>> {noformat}
>>>> start
>>>>    |
>>>>    v
>>>> may user perform operation on graph? → (no) (restrict)
>>>>    |
>>>>    v
>>>> (yes)
>>>> may user perform operation on any triple in graph → (yes) (allow)
>>>>    |
>>>>    v
>>>> (no)
>>>> may user perform operation on the specific triple in graph → (yes)
>>>> (allow)
>>>>    |
>>>>    v
>>>> (no) (restrict)
>>>> {noformat}
>>>>
>>>> My thought is that the locking may work much the same way.  Once one
>>>> thread
>>>> has the objects locked the any other thread may not lock the object.  The
>>>> process would be something like:
>>>>
>>>> Graph locking would require exclusive lock or non-exclusive lock.  If the
>>>> entire graph were to be locked for writing (as in the current system)
>>>> then
>>>> the request would be for an exclusive write-lock on the graph.  Once an
>>>> exclusive write lock has been established no other write lock may be
>>>> applied to the graph or any of its triples by any other thread.
>>>>
>>>> If a thread only wanted to lock part of the graph, for example all
>>>> triples
>>>> matching <u:foo ANY ANY>, the thread would first acquire a non-exclusive
>>>> write lock on the graph.  It would then acquire an exclusive write lock
>>>> on
>>>> all triples matching <u:foo ANY ANY>.  Once that triple match lock was
>>>> acquired no other thread would be able to lock any triple who's subject
>>>> was
>>>> u:foo.
>>>>
>>>> The lock request would need to contain the graph name and (in the case
>>>> of a
>>>> partial graph lock) a set of triple patterns to lock.  The flow for the
>>>> lock would be something like:
>>>>
>>>> {noformat}
>>>> start
>>>>    |
>>>>    v
>>>> does the thread hold an exclusive graph lock → (yes) (success)
>>>>    |
>>>>    v
>>>> (no)
>>>> does the thread want an exclusive graph lock → (yes) (go to ex graph
>>>> lock)
>>>>    |
>>>>    v
>>>> (no)
>>>> does the thread hold a non-exclusive graph lock → (no) (go to nonex graph
>>>> lock)
>>>>    |
>>>>    v
>>>> (yes) (lbl:lock acquired)
>>>> can the thread acquire all the triple locks  → (yes) (success)
>>>>    |
>>>>    v
>>>> (no) (failure)
>>>>
>>>>
>>>> (lbl: nonex graph lock)
>>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>>    |
>>>>    v
>>>> (no)
>>>> acquire non-exclusive graph lock
>>>> (goto lock acquired)
>>>>
>>>>
>>>> (lbl: ex graph lock)
>>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>>    |
>>>>    v
>>>> (no)
>>>> does any thread hold a non-exclusive graph lock → (yes) (failure)
>>>>    |
>>>>    v
>>>> (no)
>>>> acquire exclusive graph lock
>>>> (success)
>>>>
>>>> {noformat}
>>>>
>>>> The permissions system uses an abstract engine to determine if the user
>>>> has
>>>> access to the triples.  For the locking mechanism the system needs to
>>>> track
>>>> graph locks and triple patterns locked.  If a new request for a triple
>>>> pattern matches any existing (already locked) pattern the lock request
>>>> fails.
>>>>
>>>> The simple releaseLock() will release all locks the thread holds.
>>>>
>>>> Note that the locking system does not check the graph being locked to see
>>>> if the items exist in the graph it is simply tracking patterns of locks
>>>> and
>>>> determining if there are any conflicts between the patterns.
>>>>
>>>> Because this process can duplicate the current locking strategy it can be
>>>> used as a drop in replacement in the current code.  So current code would
>>>> continue to operate as it does currently but future development could be
>>>> more sensitive to locking named graphs, and partial updates to provide
>>>> multi-thread updates.
>>>>
>>>> Thoughts?
>>>> Claude
>>>>
>>>> --
>>>> I like: Like Like - The likeliest place on the web
>>>> <http://like-like.xenei.com>
>>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>>>
>>>>
>>>
>>>
>>>
>>
>
>

Re: A proposal for a new locking strategy

Posted by Andy Seaborne <an...@apache.org>.

On 02/01/16 19:36, Paul Houle wrote:
...
> Following this line,  it would be nice to be able to lock a whole structure
> that looks like
>
> [
>     a :Paper ;
>    :authors ("Alpher","Bethe","Gamow") ;
>    :publication [ :journal :PhysicalReview ; :year 1948 . ]
> ]
>
> I don't know how implementable such a thing is,  but the problem of drawing
> a line around a complex entity would be part of it.
>

Have you explored using one named graph for each entity and then using a 
default union graph?

	Andy

Re: A proposal for a new locking strategy

Posted by Paul Houle <on...@gmail.com>.

:s [] [] is a lot like a relational entity,  but I think the really
interesting thing about the RDF model is the ability to create
"post-relational" structures,  even if it does involve blank nodes.  The
future is more like JSON-LD or the nested columnar model.

In that context an entity can be a little bit more than just :s [] [] but
could involve a hierarchical structure or ordered lists.  In the case of
Freebase,  for instance,  you have the "mediator" or "CVT" nodes which form
a bipartite graph with respect to entity nodes so it is a straightforward
operation to cut out an entity and the CVTs around it.

Lately I've been working on a framework which is a bit like the "boxes and
line" products like Alteryx, KNIME, Actian -- those products are a dime a
dozen but they are all based on a tabular data model and this one is
passing small RDF graphs around,  so it supports the nested columnar model,
 logic,  etc.  Pipelines like that rapidly become unintuitive and
structurally unstable when joins get involved,  particular when they
involve "parts" of something that is a clear conceptual entity.

Obviously this thing is configured by an RDF graph,  because the point is
not that you draw a data processing pipeline but that one of these data
processing pipelines consumes schema information and a theory library to
build a graph that describes what will be done to the instances.

So there is a MetaFactory that picks apart the graph into subgraphs,  feeds
the subgraphs into the processing modules and then hooks them up in a
communications fabric.

I don't yet have a single strategy for doing the "document extraction" but
I have two or three methods that between them seem to cover the cases that
actually come up.

Following this line,  it would be nice to be able to lock a whole structure
that looks like

[
   a :Paper ;
  :authors ("Alpher","Bethe","Gamow") ;
  :publication [ :journal :PhysicalReview ; :year 1948 . ]
]

I don't know how implementable such a thing is,  but the problem of drawing
a line around a complex entity would be part of it.

On Sat, Jan 2, 2016 at 1:08 PM, Andy Seaborne <an...@apache.org> wrote:

> An SQL database row is a entity in the application data model. If you
> model a person, you have one row, but in RDF you have several triples.
> Triple level locking is analogous to cell level locking in SQL databases.
>
>         Andy
>
>
> On 02/01/16 17:01, Paul Houle wrote:
>
>> I think it is a worthwhile idea.  Given that you are still having to get a
>> global lock to get a triple lock,  isn't there still a scaling limit on
>> the
>> global lock?
>>
>> I think a lot about the things that made the relational database approach
>> so successful and certainly one thing is that row-level locking
>> corresponds
>> well to real-life access patterns.
>>
>> On Sat, Jan 2, 2016 at 9:18 AM, Claude Warren <cl...@xenei.com> wrote:
>>
>> Currently most Jena implementations use a multiple read one write
>>> solution.  However, I think that it is possible (with minimal work) do
>>> provide a solution that would allow for multiple writers by using lower
>>> level locks.
>>>
>>> I take inspiration from the Privileges code.  That code allows privileges
>>> to be determined down to the triple level.  Basically it does the
>>> following
>>> {noformat}
>>> start
>>>   |
>>>   v
>>> may user perform operation on graph? → (no) (restrict)
>>>   |
>>>   v
>>> (yes)
>>> may user perform operation on any triple in graph → (yes) (allow)
>>>   |
>>>   v
>>> (no)
>>> may user perform operation on the specific triple in graph → (yes)
>>> (allow)
>>>   |
>>>   v
>>> (no) (restrict)
>>> {noformat}
>>>
>>> My thought is that the locking may work much the same way.  Once one
>>> thread
>>> has the objects locked the any other thread may not lock the object.  The
>>> process would be something like:
>>>
>>> Graph locking would require exclusive lock or non-exclusive lock.  If the
>>> entire graph were to be locked for writing (as in the current system)
>>> then
>>> the request would be for an exclusive write-lock on the graph.  Once an
>>> exclusive write lock has been established no other write lock may be
>>> applied to the graph or any of its triples by any other thread.
>>>
>>> If a thread only wanted to lock part of the graph, for example all
>>> triples
>>> matching <u:foo ANY ANY>, the thread would first acquire a non-exclusive
>>> write lock on the graph.  It would then acquire an exclusive write lock
>>> on
>>> all triples matching <u:foo ANY ANY>.  Once that triple match lock was
>>> acquired no other thread would be able to lock any triple who's subject
>>> was
>>> u:foo.
>>>
>>> The lock request would need to contain the graph name and (in the case
>>> of a
>>> partial graph lock) a set of triple patterns to lock.  The flow for the
>>> lock would be something like:
>>>
>>> {noformat}
>>> start
>>>   |
>>>   v
>>> does the thread hold an exclusive graph lock → (yes) (success)
>>>   |
>>>   v
>>> (no)
>>> does the thread want an exclusive graph lock → (yes) (go to ex graph
>>> lock)
>>>   |
>>>   v
>>> (no)
>>> does the thread hold a non-exclusive graph lock → (no) (go to nonex graph
>>> lock)
>>>   |
>>>   v
>>> (yes) (lbl:lock acquired)
>>> can the thread acquire all the triple locks  → (yes) (success)
>>>   |
>>>   v
>>> (no) (failure)
>>>
>>>
>>> (lbl: nonex graph lock)
>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>   |
>>>   v
>>> (no)
>>> acquire non-exclusive graph lock
>>> (goto lock acquired)
>>>
>>>
>>> (lbl: ex graph lock)
>>> does any thread hold an exclusive graph lock → (yes) (failure)
>>>   |
>>>   v
>>> (no)
>>> does any thread hold a non-exclusive graph lock → (yes) (failure)
>>>   |
>>>   v
>>> (no)
>>> acquire exclusive graph lock
>>> (success)
>>>
>>> {noformat}
>>>
>>> The permissions system uses an abstract engine to determine if the user
>>> has
>>> access to the triples.  For the locking mechanism the system needs to
>>> track
>>> graph locks and triple patterns locked.  If a new request for a triple
>>> pattern matches any existing (already locked) pattern the lock request
>>> fails.
>>>
>>> The simple releaseLock() will release all locks the thread holds.
>>>
>>> Note that the locking system does not check the graph being locked to see
>>> if the items exist in the graph it is simply tracking patterns of locks
>>> and
>>> determining if there are any conflicts between the patterns.
>>>
>>> Because this process can duplicate the current locking strategy it can be
>>> used as a drop in replacement in the current code.  So current code would
>>> continue to operate as it does currently but future development could be
>>> more sensitive to locking named graphs, and partial updates to provide
>>> multi-thread updates.
>>>
>>> Thoughts?
>>> Claude
>>>
>>> --
>>> I like: Like Like - The likeliest place on the web
>>> <http://like-like.xenei.com>
>>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>>
>>>
>>
>>
>>
>


-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   ontology2@gmail.com

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275

Re: A proposal for a new locking strategy

Posted by Andy Seaborne <an...@apache.org>.

An SQL database row is a entity in the application data model. If you 
model a person, you have one row, but in RDF you have several triples. 
Triple level locking is analogous to cell level locking in SQL databases.

	Andy

On 02/01/16 17:01, Paul Houle wrote:
> I think it is a worthwhile idea.  Given that you are still having to get a
> global lock to get a triple lock,  isn't there still a scaling limit on the
> global lock?
>
> I think a lot about the things that made the relational database approach
> so successful and certainly one thing is that row-level locking corresponds
> well to real-life access patterns.
>
> On Sat, Jan 2, 2016 at 9:18 AM, Claude Warren <cl...@xenei.com> wrote:
>
>> Currently most Jena implementations use a multiple read one write
>> solution.  However, I think that it is possible (with minimal work) do
>> provide a solution that would allow for multiple writers by using lower
>> level locks.
>>
>> I take inspiration from the Privileges code.  That code allows privileges
>> to be determined down to the triple level.  Basically it does the following
>> {noformat}
>> start
>>   |
>>   v
>> may user perform operation on graph? → (no) (restrict)
>>   |
>>   v
>> (yes)
>> may user perform operation on any triple in graph → (yes) (allow)
>>   |
>>   v
>> (no)
>> may user perform operation on the specific triple in graph → (yes) (allow)
>>   |
>>   v
>> (no) (restrict)
>> {noformat}
>>
>> My thought is that the locking may work much the same way.  Once one thread
>> has the objects locked the any other thread may not lock the object.  The
>> process would be something like:
>>
>> Graph locking would require exclusive lock or non-exclusive lock.  If the
>> entire graph were to be locked for writing (as in the current system) then
>> the request would be for an exclusive write-lock on the graph.  Once an
>> exclusive write lock has been established no other write lock may be
>> applied to the graph or any of its triples by any other thread.
>>
>> If a thread only wanted to lock part of the graph, for example all triples
>> matching <u:foo ANY ANY>, the thread would first acquire a non-exclusive
>> write lock on the graph.  It would then acquire an exclusive write lock on
>> all triples matching <u:foo ANY ANY>.  Once that triple match lock was
>> acquired no other thread would be able to lock any triple who's subject was
>> u:foo.
>>
>> The lock request would need to contain the graph name and (in the case of a
>> partial graph lock) a set of triple patterns to lock.  The flow for the
>> lock would be something like:
>>
>> {noformat}
>> start
>>   |
>>   v
>> does the thread hold an exclusive graph lock → (yes) (success)
>>   |
>>   v
>> (no)
>> does the thread want an exclusive graph lock → (yes) (go to ex graph lock)
>>   |
>>   v
>> (no)
>> does the thread hold a non-exclusive graph lock → (no) (go to nonex graph
>> lock)
>>   |
>>   v
>> (yes) (lbl:lock acquired)
>> can the thread acquire all the triple locks  → (yes) (success)
>>   |
>>   v
>> (no) (failure)
>>
>>
>> (lbl: nonex graph lock)
>> does any thread hold an exclusive graph lock → (yes) (failure)
>>   |
>>   v
>> (no)
>> acquire non-exclusive graph lock
>> (goto lock acquired)
>>
>>
>> (lbl: ex graph lock)
>> does any thread hold an exclusive graph lock → (yes) (failure)
>>   |
>>   v
>> (no)
>> does any thread hold a non-exclusive graph lock → (yes) (failure)
>>   |
>>   v
>> (no)
>> acquire exclusive graph lock
>> (success)
>>
>> {noformat}
>>
>> The permissions system uses an abstract engine to determine if the user has
>> access to the triples.  For the locking mechanism the system needs to track
>> graph locks and triple patterns locked.  If a new request for a triple
>> pattern matches any existing (already locked) pattern the lock request
>> fails.
>>
>> The simple releaseLock() will release all locks the thread holds.
>>
>> Note that the locking system does not check the graph being locked to see
>> if the items exist in the graph it is simply tracking patterns of locks and
>> determining if there are any conflicts between the patterns.
>>
>> Because this process can duplicate the current locking strategy it can be
>> used as a drop in replacement in the current code.  So current code would
>> continue to operate as it does currently but future development could be
>> more sensitive to locking named graphs, and partial updates to provide
>> multi-thread updates.
>>
>> Thoughts?
>> Claude
>>
>> --
>> I like: Like Like - The likeliest place on the web
>> <http://like-like.xenei.com>
>> LinkedIn: http://www.linkedin.com/in/claudewarren
>>
>
>
>

Re: A proposal for a new locking strategy

Posted by Claude Warren <cl...@xenei.com>.

When working with triple pattern locks the global lock is a shared lock.
So once it exists it is a case of incrementing/decrementing the holder
count.  But as you say, it may still pose a scaling limit.

The global locking could be replaced with pattern only.  Then the global
lock would be to get a lock on pattern <ANY ANY ANY>.

Claude

On Sat, Jan 2, 2016 at 5:01 PM, Paul Houle <on...@gmail.com> wrote:

> I think it is a worthwhile idea.  Given that you are still having to get a
> global lock to get a triple lock,  isn't there still a scaling limit on the
> global lock?
>
> I think a lot about the things that made the relational database approach
> so successful and certainly one thing is that row-level locking corresponds
> well to real-life access patterns.
>
> On Sat, Jan 2, 2016 at 9:18 AM, Claude Warren <cl...@xenei.com> wrote:
>
> > Currently most Jena implementations use a multiple read one write
> > solution.  However, I think that it is possible (with minimal work) do
> > provide a solution that would allow for multiple writers by using lower
> > level locks.
> >
> > I take inspiration from the Privileges code.  That code allows privileges
> > to be determined down to the triple level.  Basically it does the
> following
> > {noformat}
> > start
> >  |
> >  v
> > may user perform operation on graph? → (no) (restrict)
> >  |
> >  v
> > (yes)
> > may user perform operation on any triple in graph → (yes) (allow)
> >  |
> >  v
> > (no)
> > may user perform operation on the specific triple in graph → (yes)
> (allow)
> >  |
> >  v
> > (no) (restrict)
> > {noformat}
> >
> > My thought is that the locking may work much the same way.  Once one
> thread
> > has the objects locked the any other thread may not lock the object.  The
> > process would be something like:
> >
> > Graph locking would require exclusive lock or non-exclusive lock.  If the
> > entire graph were to be locked for writing (as in the current system)
> then
> > the request would be for an exclusive write-lock on the graph.  Once an
> > exclusive write lock has been established no other write lock may be
> > applied to the graph or any of its triples by any other thread.
> >
> > If a thread only wanted to lock part of the graph, for example all
> triples
> > matching <u:foo ANY ANY>, the thread would first acquire a non-exclusive
> > write lock on the graph.  It would then acquire an exclusive write lock
> on
> > all triples matching <u:foo ANY ANY>.  Once that triple match lock was
> > acquired no other thread would be able to lock any triple who's subject
> was
> > u:foo.
> >
> > The lock request would need to contain the graph name and (in the case
> of a
> > partial graph lock) a set of triple patterns to lock.  The flow for the
> > lock would be something like:
> >
> > {noformat}
> > start
> >  |
> >  v
> > does the thread hold an exclusive graph lock → (yes) (success)
> >  |
> >  v
> > (no)
> > does the thread want an exclusive graph lock → (yes) (go to ex graph
> lock)
> >  |
> >  v
> > (no)
> > does the thread hold a non-exclusive graph lock → (no) (go to nonex graph
> > lock)
> >  |
> >  v
> > (yes) (lbl:lock acquired)
> > can the thread acquire all the triple locks  → (yes) (success)
> >  |
> >  v
> > (no) (failure)
> >
> >
> > (lbl: nonex graph lock)
> > does any thread hold an exclusive graph lock → (yes) (failure)
> >  |
> >  v
> > (no)
> > acquire non-exclusive graph lock
> > (goto lock acquired)
> >
> >
> > (lbl: ex graph lock)
> > does any thread hold an exclusive graph lock → (yes) (failure)
> >  |
> >  v
> > (no)
> > does any thread hold a non-exclusive graph lock → (yes) (failure)
> >  |
> >  v
> > (no)
> > acquire exclusive graph lock
> > (success)
> >
> > {noformat}
> >
> > The permissions system uses an abstract engine to determine if the user
> has
> > access to the triples.  For the locking mechanism the system needs to
> track
> > graph locks and triple patterns locked.  If a new request for a triple
> > pattern matches any existing (already locked) pattern the lock request
> > fails.
> >
> > The simple releaseLock() will release all locks the thread holds.
> >
> > Note that the locking system does not check the graph being locked to see
> > if the items exist in the graph it is simply tracking patterns of locks
> and
> > determining if there are any conflicts between the patterns.
> >
> > Because this process can duplicate the current locking strategy it can be
> > used as a drop in replacement in the current code.  So current code would
> > continue to operate as it does currently but future development could be
> > more sensitive to locking named graphs, and partial updates to provide
> > multi-thread updates.
> >
> > Thoughts?
> > Claude
> >
> > --
> > I like: Like Like - The likeliest place on the web
> > <http://like-like.xenei.com>
> > LinkedIn: http://www.linkedin.com/in/claudewarren
> >
>
>
>
> --
> Paul Houle
>
> *Applying Schemas for Natural Language Processing, Distributed Systems,
> Classification and Text Mining and Data Lakes*
>
> (607) 539 6254    paul.houle on Skype   ontology2@gmail.com
>
> :BaseKB -- Query Freebase Data With SPARQL
> http://basekb.com/gold/
>
> Legal Entity Identifier Lookup
> https://legalentityidentifier.info/lei/lookup/
> <http://legalentityidentifier.info/lei/lookup/>
>
> Join our Data Lakes group on LinkedIn
> https://www.linkedin.com/grp/home?gid=8267275
>



-- 
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren

Re: A proposal for a new locking strategy

Posted by Paul Houle <on...@gmail.com>.

I think it is a worthwhile idea.  Given that you are still having to get a
global lock to get a triple lock,  isn't there still a scaling limit on the
global lock?

I think a lot about the things that made the relational database approach
so successful and certainly one thing is that row-level locking corresponds
well to real-life access patterns.

On Sat, Jan 2, 2016 at 9:18 AM, Claude Warren <cl...@xenei.com> wrote:

> Currently most Jena implementations use a multiple read one write
> solution.  However, I think that it is possible (with minimal work) do
> provide a solution that would allow for multiple writers by using lower
> level locks.
>
> I take inspiration from the Privileges code.  That code allows privileges
> to be determined down to the triple level.  Basically it does the following
> {noformat}
> start
>  |
>  v
> may user perform operation on graph? → (no) (restrict)
>  |
>  v
> (yes)
> may user perform operation on any triple in graph → (yes) (allow)
>  |
>  v
> (no)
> may user perform operation on the specific triple in graph → (yes) (allow)
>  |
>  v
> (no) (restrict)
> {noformat}
>
> My thought is that the locking may work much the same way.  Once one thread
> has the objects locked the any other thread may not lock the object.  The
> process would be something like:
>
> Graph locking would require exclusive lock or non-exclusive lock.  If the
> entire graph were to be locked for writing (as in the current system) then
> the request would be for an exclusive write-lock on the graph.  Once an
> exclusive write lock has been established no other write lock may be
> applied to the graph or any of its triples by any other thread.
>
> If a thread only wanted to lock part of the graph, for example all triples
> matching <u:foo ANY ANY>, the thread would first acquire a non-exclusive
> write lock on the graph.  It would then acquire an exclusive write lock on
> all triples matching <u:foo ANY ANY>.  Once that triple match lock was
> acquired no other thread would be able to lock any triple who's subject was
> u:foo.
>
> The lock request would need to contain the graph name and (in the case of a
> partial graph lock) a set of triple patterns to lock.  The flow for the
> lock would be something like:
>
> {noformat}
> start
>  |
>  v
> does the thread hold an exclusive graph lock → (yes) (success)
>  |
>  v
> (no)
> does the thread want an exclusive graph lock → (yes) (go to ex graph lock)
>  |
>  v
> (no)
> does the thread hold a non-exclusive graph lock → (no) (go to nonex graph
> lock)
>  |
>  v
> (yes) (lbl:lock acquired)
> can the thread acquire all the triple locks  → (yes) (success)
>  |
>  v
> (no) (failure)
>
>
> (lbl: nonex graph lock)
> does any thread hold an exclusive graph lock → (yes) (failure)
>  |
>  v
> (no)
> acquire non-exclusive graph lock
> (goto lock acquired)
>
>
> (lbl: ex graph lock)
> does any thread hold an exclusive graph lock → (yes) (failure)
>  |
>  v
> (no)
> does any thread hold a non-exclusive graph lock → (yes) (failure)
>  |
>  v
> (no)
> acquire exclusive graph lock
> (success)
>
> {noformat}
>
> The permissions system uses an abstract engine to determine if the user has
> access to the triples.  For the locking mechanism the system needs to track
> graph locks and triple patterns locked.  If a new request for a triple
> pattern matches any existing (already locked) pattern the lock request
> fails.
>
> The simple releaseLock() will release all locks the thread holds.
>
> Note that the locking system does not check the graph being locked to see
> if the items exist in the graph it is simply tracking patterns of locks and
> determining if there are any conflicts between the patterns.
>
> Because this process can duplicate the current locking strategy it can be
> used as a drop in replacement in the current code.  So current code would
> continue to operate as it does currently but future development could be
> more sensitive to locking named graphs, and partial updates to provide
> multi-thread updates.
>
> Thoughts?
> Claude
>
> --
> I like: Like Like - The likeliest place on the web
> <http://like-like.xenei.com>
> LinkedIn: http://www.linkedin.com/in/claudewarren
>



-- 
Paul Houle

*Applying Schemas for Natural Language Processing, Distributed Systems,
Classification and Text Mining and Data Lakes*

(607) 539 6254    paul.houle on Skype   ontology2@gmail.com

:BaseKB -- Query Freebase Data With SPARQL
http://basekb.com/gold/

Legal Entity Identifier Lookup
https://legalentityidentifier.info/lei/lookup/
<http://legalentityidentifier.info/lei/lookup/>

Join our Data Lakes group on LinkedIn
https://www.linkedin.com/grp/home?gid=8267275