You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Stephen Allen (JIRA)" <ji...@apache.org> on 2011/03/17 18:22:29 UTC

[jira] Issue Comment Edited: (JENA-41) Different policy for concurrency access in TDB supporting a single writer and multiple readers

    [ https://issues.apache.org/jira/browse/JENA-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008009#comment-13008009 ] 

Stephen Allen edited comment on JENA-41 at 3/17/11 5:20 PM:
------------------------------------------------------------

I have been working on adding transactions to Parliament (using a WAL for durability but with MVCC to provide nonblocking readers).  I've been thinking about the interface between ARQ and graph stores and have attached some code and interfaces I have come up with.

I see two main ways to modify the DatasetGraph interface to provide transactions.

1)  Add explicit transaction support to the interface (see TransactionalDataGraph.java).  This means breaking interface changes or a second codepath to deal with these new DataGraphs.  But it has the benefit of cleaner syntax if a single thread is using more than one transaction (some uses for this capability would be: a) eliminate memory/disk buffers in the SPARQL Update code by obtaining sequential transactions for stores that supported snapshot isolation; b) simplify transaction handling if queries were changed not to run on a single thread (either parallel execution or NIO-style asynchronous execution).  See [1] for sample usage with multiple transactions.

2)  Tie transaction information to the current thread (see TransactionHandler.java).  This works well with the current ARQ query execution but becomes more unwieldy when working with multiple transactions on a single thread.  See [2] for example usage.



[1] 
   public void MultipleTransactions1()
    {
        TransactionalDatasetGraph tdsg = ...
        Quad quadToAdd = ...
        
        // Omitted error handling for clarity

        // A read-only transaction
        Transaction t1 = tdsg.getTransactionManager().begin(IsolationLevel.SERIALIZABLE, AccessMode.READ_ONLY);
        
        // A read-write transaction
        Transaction t2 = tdsg.getTransactionManager().begin(IsolationLevel.SERIALIZABLE, AccessMode.READ_WRITE);
        
        System.out.println("t1.size()= " + tdsg.size(t1));
        
        System.out.println("Adding statement in t2");
        tdsg.add(quadToAdd, t2);
        System.out.println("t2.size()= " + tdsg.size(t1));
        t2.commit();
        
        System.out.println("t1.size()= " + tdsg.size(t1));
        t1.commit();
        
        // Output
        // ------
        // t1.size()= 0
        // Adding statement in t2
        // t2.size()= 1
        // t1.size()= 0
    }

[2] 
   public void MultipleTransactions2()
    {
        // Error handling omitted for clarity
        
        DatasetGraph dsg = ...
        Quad quadToAdd = ...
        
        TransactionHandler th = dsg.getTransactionHandler();
        
        // A read-only transaction
        th.begin(IsolationLevel.SERIALIZABLE, AccessMode.READ_ONLY);
        TransactionHandle t1 = th.suspend();
        
        // A read-write transaction
        th.begin(IsolationLevel.SERIALIZABLE, AccessMode.READ_WRITE);
        TransactionHandle t2 = th.suspend();
        
        th.resume(t1);
        System.out.println("t1.size()= " + dsg.size());
        
        t1 = th.suspend();
        th.resume(t2);
        System.out.println("Adding statement in t2");
        dsg.add(quadToAdd);
        System.out.println("t2.size()= " + dsg.size());
        th.commit();
        
        th.resume(t1);
        System.out.println("t1.size()= " + dsg.size());
        th.commit();
        
        // Output
        // ------
        // t1.size()= 0
        // Adding statement in t2
        // t2.size()= 1
        // t1.size()= 0
    }


 

      was (Author: sallen):
    I have been working on adding transactions to Parliament (using a WAL for durability but with MVCC to provide nonblocking readers).  I've been thinking about the interface between ARQ and graph stores and have attached some code and interfaces I have come up with.

I see two main ways to modify the DatasetGraph interface to provide transactions.

1)  Add explicit transaction support to the interface (see TransactionalDataGraph.java).  This means breaking interface changes or a second codepath to deal with these new DataGraphs.  But it has the benefit of cleaner syntax if a single thread is using more than one transaction (some uses for this capability would be: a) eliminate memory/disk buffers in the SPARQL Update code by obtaining sequential transactions for stores that supported snapshot isolation; b) simplify transaction handling if queries were changed not to run on a single thread (either parallel execution or NIO-style asynchronous execution).  See [1] for sample usage with multiple transactions.

2)  Tie transaction information to the current thread (see TransactionHandler.java).  This works well with the current ARQ query execution but becomes more unwieldy when working with multiple transactions on a single thread.  See [2] for example usage.



[1] 
   public void MultipleTransactions1()
    {
        TransactionalDatasetGraph tdsg = ...
        Quad quadToAdd = ...
        
        // Omitted error handling for clarity

        // A read-only transaction
        Transaction t1 = tdsg.getTransactionManager().begin(IsolationLevel.READ_UNCOMMITTED, AccessMode.READ_ONLY);
        
        // A read-write transaction
        Transaction t2 = tdsg.getTransactionManager().begin(IsolationLevel.SERIALIZABLE, AccessMode.READ_WRITE);
        
        System.out.println("t1.size()= " + tdsg.size(t1));
        
        System.out.println("Adding statement in t2");
        tdsg.add(quadToAdd, t2);
        System.out.println("t2.size()= " + tdsg.size(t1));
        t2.commit();
        
        System.out.println("t1.size()= " + tdsg.size(t1));
        t1.commit();
        
        // Output
        // ------
        // t1.size()= 0
        // Adding statement in t2
        // t2.size()= 1
        // t1.size()= 0
    }

[2] 
   public void MultipleTransactions2()
    {
        // Error handling omitted for clarity
        
        DatasetGraph dsg = ...
        Quad quadToAdd = ...
        
        TransactionHandler th = dsg.getTransactionHandler();
        
        // A read-only transaction
        th.begin(IsolationLevel.READ_UNCOMMITTED, AccessMode.READ_ONLY);
        TransactionHandle t1 = th.suspend();
        
        // A read-write transaction
        th.begin(IsolationLevel.SERIALIZABLE, AccessMode.READ_WRITE);
        TransactionHandle t2 = th.suspend();
        
        th.resume(t1);
        System.out.println("t1.size()= " + dsg.size());
        
        t1 = th.suspend();
        th.resume(t2);
        System.out.println("Adding statement in t2");
        dsg.add(quadToAdd);
        System.out.println("t2.size()= " + dsg.size());
        th.commit();
        
        th.resume(t1);
        System.out.println("t1.size()= " + dsg.size());
        th.commit();
        
        // Output
        // ------
        // t1.size()= 0
        // Adding statement in t2
        // t2.size()= 1
        // t1.size()= 0
    }


 
  
> Different policy for concurrency access in TDB supporting a single writer and multiple readers
> ----------------------------------------------------------------------------------------------
>
>                 Key: JENA-41
>                 URL: https://issues.apache.org/jira/browse/JENA-41
>             Project: Jena
>          Issue Type: New Feature
>          Components: Fuseki, TDB
>            Reporter: Paolo Castagna
>         Attachments: Transaction.java, TransactionHandle.java, TransactionHandler.java, TransactionManager.java, TransactionManagerBase.java, TransactionalDatasetGraph.java
>
>
> As a follow up to a discussion about "Concurrent updates in TDB" [1] on the jena-users mailing list, I am creating this as a new feature request.
> Currently TDB requires developers to use a Multiple Reader or Single Writer (MRSW) locking policy for concurrency access [2]. Not doing this could cause data corruptions.
> The MRSW is indeed a MR xor SW (i.e. while a writer has a lock, no readers are allowed and, similarly, if a reader has a lock, no writes are possible).
> This works fine in most of the situation, but there might be problems in presence of long writes or long reads.
> It has been suggested that a "journaled file access" could be used to solve the issue regarding a long write blocking reads.
>  [1] http://markmail.org/message/jnqm6pn32df4wgte
>  [2] http://openjena.org/wiki/TDB/JavaAPI#Concurrency

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira