You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Andy Seaborne (JIRA)" <ji...@apache.org> on 2015/05/06 12:17:00 UTC

[jira] [Commented] (JENA-648) Make TDB datasets harder to corrupt

    [ https://issues.apache.org/jira/browse/JENA-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14530293#comment-14530293 ] 

Andy Seaborne commented on JENA-648:
------------------------------------

Someone appears to have found a way to avoid the locking check. The setup is two WAR files running in one web application server (glassfish, specifically). So the process id is the same but there are two different TDB instances from two different classloaders.

Report: 
http://mail-archives.apache.org/mod_mbox/jena-users/201505.mbox/%3CCAO4GvXQEhKDb7__twF8%2Bqgw7we1%3DPw-mp%3D-TLcqmOcuELfKjXg%40mail.gmail.com%3E

A stricter check of one use from one process id, relying on an instance of TDB to only open the database once, might be possible.  If the database StoreConnection is explicitly expelled, then removing the lock file allows it to be reopened.  The tests may still be impacted.

> Make TDB datasets harder to corrupt
> -----------------------------------
>
>                 Key: JENA-648
>                 URL: https://issues.apache.org/jira/browse/JENA-648
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: TDB
>            Reporter: Rob Vesse
>            Assignee: Rob Vesse
>         Attachments: JENA-648-lock-files.patch
>
>
> This RFE comes out of discussions I had in person with Andy earlier this week.  On the mailing lists and Q&A sites we see a steady stream of questions from people who have corrupted TDB databases and it would be nice if we could put in place features that make this harder to do.
> There are two main things we should do in the long term as I see it:
> # Make using TDB non-transactonally more difficult
> # Put in place some mechanism to make it difficult for multiple JVMs to access the same TDB dataset simultaneously
> Me and Andy think the first could be achieved by making TDB datasets operation in auto-commit more rather than non-transactional mode by default.  In order to allow this we likely need upgradeable read transactions to be supported.  As part of this change non-transactional mode would still be supported but users would have to explicitly set some "Here be Dragons" style flag in order to do this.  Users who aren't using transactions currently would likely merely see performance drop since suddenly they are getting auto-commits on every operation but when they complain we can tell them they should be using transactions properly to ensure their TDB databases remain uncorrupted.
> As far as the second point goes we could likely do this the way a lot of other applications do by having the code write a lock file to disk when a database is opened which contains the owning processes PID.  Whenever you go to open a database the presence of the lock file is checked for and if present the PID validated with the code refusing to open the database if the PIDs do not match.  There would likely need to be some code to cope with the case where the lock file gets left around and the owning PID is not alive but that shouldn't be too complicated.
> Since these may be considered as substantial behavioural changes to TDB these may likely go into Jena 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)