You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "ajs6f@virginia.edu" <aj...@virginia.edu> on 2015/08/03 18:13:40 UTC

Re: Journaling DatasetGraph

I've made some emendations to (hopefully) fix this problem. In order to so do, I added a method to Lock itself to report the quality of an instance, simply as an enumeration. I had hoped to avoid touching any of the extant code, but because Lock is a public type that can be instantiated by anyone, I just can't see how to resolve this problem without some way for a Lock to categorize itself independently of the type system's inheritance.

Feedback welcome!

---
A. Soroka
The University of Virginia Library

On Jul 29, 2015, at 5:04 PM, Andy Seaborne <an...@apache.org> wrote:

> The lock provided by the underlying dataset may matter.  DatasetGraphs support critical sections.  DatasetGraphWithLock uses critical sections of the underlying dataset.
> 
> I gave an (hypothetical) example where the lock must be more restrictive than ReentrantReadWriteLock (LockMRSW is a ReentrantReadWriteLock + counting support to catch application errors).
> 
> DatasetGraphWithRecord is relying on single-W for its own datastructures.
> 
> 	Andy
> 
> On 29/07/15 21:22, ajs6f@virginia.edu wrote:
>> I'm not sure I understand this advice-- are you saying that because no DatasetGraph can be assumed to support MR, there isn't any point in trying to support MR at the level of DatasetGraphWithRecord? That would seem to make my whole effort a bit pointless.
>> 
>> Or are you saying that because, in practice, all DatasetGraphs _do_ support MR, there's no need to enforce it at the level of DatasetGraphWithRecord?
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>> On Jul 29, 2015, at 4:14 PM, Andy Seaborne <an...@apache.org> wrote:
>> 
>>> On 27/07/15 18:06, ajs6f@virginia.edu wrote:
>>>>> Is there some specific reason as to why you override the DatasetGraphWithLock lock?
>>>> Yes, because DatasetGraphWithLock has no Lock that I could find, and it inherits getLock() from DatasetGraphTrackActive, which just pulls the lock from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in play. But maybe I am misunderstanding the interaction here? (No surprise! {grin})
>>>> 
>>> 
>>> A DatasetGraph provides whatever lock is suitable to meet the contract of concurrency [1]
>>> 
>>> Some implementations (there aren't any) may not even be able to support true parallel readers (for example, datastructures that they may make internal changes even in read operations like moving recently accessed items to the top or caching computation needed for read).
>>> 
>>> There aren't any (the rules are R-safe) - locks are always LockMRSW.
>>> 
>>> [1] http://jena.apache.org/documentation/notes/concurrency-howto.html
>>> 
>>> 	Andy
>>> 
>> 
> 


Re: Journaling DatasetGraph

Posted by "ajs6f@virginia.edu" <aj...@virginia.edu>.
Thanks for the feedback Andy.

> 2/
> Datasets that provide support for MW cases and don't provide transactions seem rather unlikely so may be document what kind of DatasetGraph is being supported by DatasetGraphWithRecord then just use the underlying lock..

Okay, that's certainly simpler! And it keeps my grubby fingers out of Lock. {grin} 

> 3/
> There are two thing to protect in DatasetGraphWithRecord : the underlying dataset and transaction log for supporting abort for writers only.  They can have separate mechanisms.  Use the dataset lock for the DatasetGraph actions and make the transaction undo log operations be safe by other means.

You mean an independent lock visible only inside DatasetGraphWithRecord?

> .. hmm ... the order of entries in the log may matter so true parallel MW looks increasing hard to deal with anyway.  Document and not worry for now?

My fear has been that MW means

a) a log per write-transaction and connections from the transaction to a particular set of states for the indexes
b) with those "forward" states invisible outside the transaction
c) and all the nightmare fun of merging states!

---
A. Soroka
The University of Virginia Library

On Aug 4, 2015, at 4:32 PM, Andy Seaborne <an...@apache.org> wrote:

> On 03/08/15 17:13, ajs6f@virginia.edu wrote:
>> I've made some emendations to (hopefully) fix this problem. In order to so do, I added a method to Lock itself to report the quality of an instance, simply as an enumeration. I had hoped to avoid touching any of the extant code, but because Lock is a public type that can be instantiated by anyone, I just can't see how to resolve this problem without some way for a Lock to categorize itself independently of the type system's inheritance.
>> 
>> Feedback welcome!
> 
> A few things occur to me:
> 
> 1/
> The transaction log is for supporting abort for writers only.  Nothing needs to be done in DatasetGraphWithRecord for readers. DatasetGraphWithLock does what's needed.  So you don't even need to startRecording for a READ (and the commit clear - _end always aborts is an interesting way to do it!).
> 
> 2/
> Datasets that provide support for MW cases and don't provide transactions seem rather unlikely so may be document what kind of DatasetGraph is being supported by DatasetGraphWithRecord then just use the underlying lock..
> 
> It's not just a case of using ConcurrentHashMap, say, as likely there would be multiple of them for different indexes and that would give weird consistency issues as different parts get updated safely with respect to part of the datastructure but it will be visibly different depending on what the reader uses.  So I think MW will have additional coordination.
> 
> 3/
> 
> There are two thing to protect in DatasetGraphWithRecord : the underlying dataset and transaction log for supporting abort for writers only.  They can have separate mechanisms.  Use the dataset lock for the DatasetGraph actions and make the transaction undo log operations be safe by other means.
> 
> .. hmm ... the order of entries in the log may matter so true parallel MW looks increasing hard to deal with anyway.  Document and not worry for now?
> 
> 	Andy
> 
>> 
>> ---
>> A. Soroka
>> The University of Virginia Library
>> 
>> On Jul 29, 2015, at 5:04 PM, Andy Seaborne <an...@apache.org> wrote:
>> 
>>> The lock provided by the underlying dataset may matter.  DatasetGraphs support critical sections.  DatasetGraphWithLock uses critical sections of the underlying dataset.
>>> 
>>> I gave an (hypothetical) example where the lock must be more restrictive than ReentrantReadWriteLock (LockMRSW is a ReentrantReadWriteLock + counting support to catch application errors).
>>> 
>>> DatasetGraphWithRecord is relying on single-W for its own datastructures.
>>> 
>>> 	Andy
>>> 
>>> On 29/07/15 21:22, ajs6f@virginia.edu wrote:
>>>> I'm not sure I understand this advice-- are you saying that because no DatasetGraph can be assumed to support MR, there isn't any point in trying to support MR at the level of DatasetGraphWithRecord? That would seem to make my whole effort a bit pointless.
>>>> 
>>>> Or are you saying that because, in practice, all DatasetGraphs _do_ support MR, there's no need to enforce it at the level of DatasetGraphWithRecord?
>>>> 
>>>> ---
>>>> A. Soroka
>>>> The University of Virginia Library
>>>> 
>>>> On Jul 29, 2015, at 4:14 PM, Andy Seaborne <an...@apache.org> wrote:
>>>> 
>>>>> On 27/07/15 18:06, ajs6f@virginia.edu wrote:
>>>>>>> Is there some specific reason as to why you override the DatasetGraphWithLock lock?
>>>>>> Yes, because DatasetGraphWithLock has no Lock that I could find, and it inherits getLock() from DatasetGraphTrackActive, which just pulls the lock from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in play. But maybe I am misunderstanding the interaction here? (No surprise! {grin})
>>>>>> 
>>>>> 
>>>>> A DatasetGraph provides whatever lock is suitable to meet the contract of concurrency [1]
>>>>> 
>>>>> Some implementations (there aren't any) may not even be able to support true parallel readers (for example, datastructures that they may make internal changes even in read operations like moving recently accessed items to the top or caching computation needed for read).
>>>>> 
>>>>> There aren't any (the rules are R-safe) - locks are always LockMRSW.
>>>>> 
>>>>> [1] http://jena.apache.org/documentation/notes/concurrency-howto.html
>>>>> 
>>>>> 	Andy
>>>>> 
>>>> 
>>> 
>> 
> 


Re: Journaling DatasetGraph

Posted by Andy Seaborne <an...@apache.org>.
On 03/08/15 17:13, ajs6f@virginia.edu wrote:
> I've made some emendations to (hopefully) fix this problem. In order to so do, I added a method to Lock itself to report the quality of an instance, simply as an enumeration. I had hoped to avoid touching any of the extant code, but because Lock is a public type that can be instantiated by anyone, I just can't see how to resolve this problem without some way for a Lock to categorize itself independently of the type system's inheritance.
>
> Feedback welcome!

A few things occur to me:

1/
The transaction log is for supporting abort for writers only.  Nothing 
needs to be done in DatasetGraphWithRecord for readers. 
DatasetGraphWithLock does what's needed.  So you don't even need to 
startRecording for a READ (and the commit clear - _end always aborts is 
an interesting way to do it!).

2/
Datasets that provide support for MW cases and don't provide 
transactions seem rather unlikely so may be document what kind of 
DatasetGraph is being supported by DatasetGraphWithRecord then just use 
the underlying lock..

It's not just a case of using ConcurrentHashMap, say, as likely there 
would be multiple of them for different indexes and that would give 
weird consistency issues as different parts get updated safely with 
respect to part of the datastructure but it will be visibly different 
depending on what the reader uses.  So I think MW will have additional 
coordination.

3/

There are two thing to protect in DatasetGraphWithRecord : the 
underlying dataset and transaction log for supporting abort for writers 
only.  They can have separate mechanisms.  Use the dataset lock for the 
DatasetGraph actions and make the transaction undo log operations be 
safe by other means.

.. hmm ... the order of entries in the log may matter so true parallel 
MW looks increasing hard to deal with anyway.  Document and not worry 
for now?

	Andy

>
> ---
> A. Soroka
> The University of Virginia Library
>
> On Jul 29, 2015, at 5:04 PM, Andy Seaborne <an...@apache.org> wrote:
>
>> The lock provided by the underlying dataset may matter.  DatasetGraphs support critical sections.  DatasetGraphWithLock uses critical sections of the underlying dataset.
>>
>> I gave an (hypothetical) example where the lock must be more restrictive than ReentrantReadWriteLock (LockMRSW is a ReentrantReadWriteLock + counting support to catch application errors).
>>
>> DatasetGraphWithRecord is relying on single-W for its own datastructures.
>>
>> 	Andy
>>
>> On 29/07/15 21:22, ajs6f@virginia.edu wrote:
>>> I'm not sure I understand this advice-- are you saying that because no DatasetGraph can be assumed to support MR, there isn't any point in trying to support MR at the level of DatasetGraphWithRecord? That would seem to make my whole effort a bit pointless.
>>>
>>> Or are you saying that because, in practice, all DatasetGraphs _do_ support MR, there's no need to enforce it at the level of DatasetGraphWithRecord?
>>>
>>> ---
>>> A. Soroka
>>> The University of Virginia Library
>>>
>>> On Jul 29, 2015, at 4:14 PM, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>> On 27/07/15 18:06, ajs6f@virginia.edu wrote:
>>>>>> Is there some specific reason as to why you override the DatasetGraphWithLock lock?
>>>>> Yes, because DatasetGraphWithLock has no Lock that I could find, and it inherits getLock() from DatasetGraphTrackActive, which just pulls the lock from the wrapped DatasetGraph. I wanted to make sure that a MRSW Lock is in play. But maybe I am misunderstanding the interaction here? (No surprise! {grin})
>>>>>
>>>>
>>>> A DatasetGraph provides whatever lock is suitable to meet the contract of concurrency [1]
>>>>
>>>> Some implementations (there aren't any) may not even be able to support true parallel readers (for example, datastructures that they may make internal changes even in read operations like moving recently accessed items to the top or caching computation needed for read).
>>>>
>>>> There aren't any (the rules are R-safe) - locks are always LockMRSW.
>>>>
>>>> [1] http://jena.apache.org/documentation/notes/concurrency-howto.html
>>>>
>>>> 	Andy
>>>>
>>>
>>
>