You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Stephen Allen (JIRA)" <ji...@apache.org> on 2011/03/21 23:15:05 UTC
[jira] [Issue Comment Edited] (JENA-41) Different policy for concurrency access in TDB supporting a single writer and multiple readers

    [ https://issues.apache.org/jira/browse/JENA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13009433#comment-13009433 ] 

Stephen Allen edited comment on JENA-41 at 3/21/11 10:14 PM:
-------------------------------------------------------------

I think your idea about the DatasetGraph being the interface for transactions makes sense.  Transactional DatasetGraphs could also provide fallback behavior for legacy code by implementing autocommit transactions if the user called methods on a dataset that was not initialized in a transactionBegin() call.


With regard to the isolation levels, I believe some of the lower levels can make sense for particular applications or queries.  For example say you want to know the size of a few of graphs.

BEGIN READ_ONLY;
select count (*) where { graph <http://example/g1> { ?s ?p ?o . } } ;
select count (*) where { graph <http://example/g2> { ?s ?p ?o . } } ;
COMMIT;

Assuming a traditional pessimistic locking scheme, running the transaction at SERIALIZABLE could cause the locks held by the first select query to also be held through the second query, reducing concurrency (using two transactions instead might not be a good idea as there is usually some amount of overhead associated with creating and committing transactions).

If you were OK with the possibility that the two query results are not truly serializable with respect to each other, then you could improve concurrency by using a READ_COMMITTED isolation level instead that would give serializable results for each query (but not the whole transaction).  And if you really just needed a rough estimate of size, using READ_UNCOMMITTED may be able to avoid locking all together.

An additional motivating factor for MVCC implementations is that they may be implementing snapshot isolation, which probably maps better to REPEATABLE_READ than SERIALIZABLE (especially if it could do predicate locking for true serializable behavior but allow cheaper snapshot isolation if that was all that was needed).  The Postgres documentation does a good job of describing this [1].

I would find it useful to have multiple isolation levels available (even if internally I'm mapping them all to SERIALIZABLE at first).  The four ANSI Isolation levels seem appropriate, and remember that implementations are allowed to map unavailable lower levels to higher levels as desired.


[1] http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html



      was (Author: sallen):
    I think your idea about the DatasetGraph being the interface for transactions makes sense.  Transactional DatasetGraphs could also provide fallback behavior for legacy code by implementing autocommit transactions if the user called methods on a dataset that was not initialized in a transactionBegin() call.


With regard to the isolation levels, I believe some of the lower levels can make sense for particular applications or queries.  For example say you want to know the size of a few of graphs.

BEGIN READ_ONLY;
select count (*) where { graph <http://example/g1> { ?s ?p ?o . } } ;
select count (*) where { graph <http://example/g2> { ?s ?p ?o . } } ;
COMMIT;

Assuming a traditional pessimistic locking scheme, running the transaction at SERIALIZABLE could cause the locks held by the first select query to also be held through the second query, reducing concurrency (using two transactions instead might not be a good idea as there is usually some amount of overhead associated with creating and committing transactions).

If you were OK with the possibility that the two query results are not truly serializable with respect to each other, then you could improve concurrency by using a READ_COMMITTED isolation level instead that would give serializable results for each query (but not the whole transaction).  And if you really just needed a rough estimate of size, using READ_UNCOMMITTED may be able to avoid locking all together.

An additional motivating factor for MVCC implementations is that they may be implementing snapshot isolation, which probably maps better to READ_COMMITTED than SERIALIZABLE (especially if it could do predicate locking for true serializable behavior but allow cheaper snapshot isolation if that was all that was needed).  The Postgres documentation does a good job of describing this [1].

I would find it useful to have multiple isolation levels available (even if internally I'm mapping them all to SERIALIZABLE at first).  The four ANSI Isolation levels seem appropriate, and remember that implementations are allowed to map unavailable lower levels to higher levels as desired.


[1] http://developer.postgresql.org/pgdocs/postgres/transaction-iso.html


  
> Different policy for concurrency access in TDB supporting a single writer and multiple readers
> ----------------------------------------------------------------------------------------------
>
>                 Key: JENA-41
>                 URL: https://issues.apache.org/jira/browse/JENA-41
>             Project: Jena
>          Issue Type: New Feature
>          Components: Fuseki, TDB
>            Reporter: Paolo Castagna
>         Attachments: Transaction.java, TransactionHandle.java, TransactionHandler.java, TransactionManager.java, TransactionManagerBase.java, TransactionalDatasetGraph.java
>
>
> As a follow up to a discussion about "Concurrent updates in TDB" [1] on the jena-users mailing list, I am creating this as a new feature request.
> Currently TDB requires developers to use a Multiple Reader or Single Writer (MRSW) locking policy for concurrency access [2]. Not doing this could cause data corruptions.
> The MRSW is indeed a MR xor SW (i.e. while a writer has a lock, no readers are allowed and, similarly, if a reader has a lock, no writes are possible).
> This works fine in most of the situation, but there might be problems in presence of long writes or long reads.
> It has been suggested that a "journaled file access" could be used to solve the issue regarding a long write blocking reads.
>  [1] http://markmail.org/message/jnqm6pn32df4wgte
>  [2] http://openjena.org/wiki/TDB/JavaAPI#Concurrency

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira