You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2012/09/20 13:23:07 UTC

[jira] [Commented] (JENA-327) TDB Tx transaction lock to permit backups

    [ https://issues.apache.org/jira/browse/JENA-327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459519#comment-13459519 ] 

Andy Seaborne commented on JENA-327:
------------------------------------

You can start a READ transaction and then work with the dataset (or DatasetGraph) - this is the most stable approach as it is using the current API contract.  Fuseki uses it to take dataset backups (it writes NQuads) .  Operations can continue while the backup is done.

Or you can start a WRITE transaction, whereby you are guaranteed nothing else will be changing the files on disk.  Even if async writeback is introduced, an open write transaction is going to hold back writeback.

Finally, you could manage the request flow in the client by holding everything up, flushing the journal manually, then backing up the files.

The READ mechanism is the safest long-term and does not block other threads (readers or writers).

                
> TDB Tx transaction lock to permit backups
> -----------------------------------------
>
>                 Key: JENA-327
>                 URL: https://issues.apache.org/jira/browse/JENA-327
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: TDB
>    Affects Versions: TDB 0.9.4
>            Reporter: Simon Helsen
>
> With large repositories, it is important to be able to create backups once in a while. This is because recreating an rdf store with millions of triples can be forbiddingly expensive. Moreover, it should be possible to take those backups while still allowing read activity on the store as in many cases, a complete shutdown is usually not possible. Before the introduction of tx, it was relatively straightforward to provide the right locks on the client-side to safely suspend any disk activity for a period of time enough to make a backup of the index. 
> However, since tx, things have become slightly more complicated because TDB Tx touches the disk at other times than when performing write/sync activities. Right now, because of some understanding of how TDB Tx is implemented, it is still possible for clients to avoid disk activities to implement a backup process, but this dependency on TDB Tx implementation details is not very good. Moreover, we anticipate that in the future, the merging process from the journal into the main index may become entirely asynchornous for performance reasons. The moment that happens, client have no control anymore as to when the disk is being touched.
> For this reason, we are requesting the following feature: a "backup" lock (by lack of a better name). Its semantics is that when the lock is taken, TDB Tx guarantees that no disk activity takes place and if necessary pauses activities. In other words, no write transaction should be able to complete and read transactions will not attempt to merge the journal. The idea would be that regular read activities can still continue. The API could be as simple as something like this:
> try {
> dataset.begin(ReadWrite.BACKUP) ;
> <do whatever is necessary to backup the index>
> } finally {
> dataset.end()
> }
> As for the implementation, we suspect you currently have locks in place which could be used to guarantee this behavior. E.g. could txn.getBaseDataset().getLock().enterCriticalSection(Lock.WRITE) be sufficient?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira