You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by "Rob Vesse (JIRA)" <ji...@apache.org> on 2014/05/30 16:31:03 UTC

[jira] [Updated] (JENA-648) Make TDB datasets harder to corrupt

     [ https://issues.apache.org/jira/browse/JENA-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Vesse updated JENA-648:
---------------------------

    Attachment: JENA-648-lock-files.patch

Attaching a first pass patch at addressing point 2

This patch adds support for using a lock file {{tdb.lock}} in the database directory to prevent multi-JVM usage of a disk based TDB database thus providing some protection again corruption due to multi-JVM usage.  The lock file contains the PID of the owning process and therefore can be used to determine whether the current JVM owns the location, if it does not then connections to the location are prevented assuming I've put the locking early enough - Andy can you look at this.

Currently the lock file is only removed when a connection is explicitly released so stale locks may be left around.  The locking logic includes checks to see whether the owner of the lock is actually live or not though this won't be foolproof because the OS could have reassigned the PID to another process.

This is intended for 2.12.0 and is still rough around the edges in places.  Mostly it needs some logging and additional test cases writing, also some of the existing test cases will fail on Windows primarily because the process liveness check on Windows is more reliable.

There are some interesting edge cases where we don't lock:
- If we are unable to get our own PID
- If the lock file indicates the location is owned by another PID and we can't run the appropriate command - {{ps}} on *nix or {{task list}} on Windows - to determine if the existing owner is alive or not

Another interesting edge case is what happens if the database directory is on a read-only file system.  In that case locking is unnecessary but I haven't looked into being able to tell if a file system is read only, and of course the file system may only appear read only to a given process.  Other processes may be able to write to the location in which case we'd still want locking to prevent multi-JVM use.  In general though I suspect TDB on a read-only file system may fail elsewhere in other interesting ways.

There are also fun race conditions, what happens if two processes start at almost the same time?  Both potentially see no lock (or a stale lock) and attempt to take the lock.  If we are particularly unlucky both succeed with their individual writes to the lock file without conflicting and both think they own the lock.  To try to avoid this the patch to {{StoreConnection}} re-checks whether it does own the lock before proceeding though I can still envisage race conditions where both processes believe they own the lock.

We are unlikely to be able to prevent every race condition so this may be moot, as long as it provides some level of corruption prevention then it meets the aim of the issue.

One other question is should we try and make the lock file hidden to make it harder for users to remove it?  I am tempted to say no because a determined user can easily find and remove the lock file, as long as this functionality protects most users then it is probably safe.

> Make TDB datasets harder to corrupt
> -----------------------------------
>
>                 Key: JENA-648
>                 URL: https://issues.apache.org/jira/browse/JENA-648
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: TDB
>            Reporter: Rob Vesse
>            Assignee: Rob Vesse
>         Attachments: JENA-648-lock-files.patch
>
>
> This RFE comes out of discussions I had in person with Andy earlier this week.  On the mailing lists and Q&A sites we see a steady stream of questions from people who have corrupted TDB databases and it would be nice if we could put in place features that make this harder to do.
> There are two main things we should do in the long term as I see it:
> # Make using TDB non-transactonally more difficult
> # Put in place some mechanism to make it difficult for multiple JVMs to access the same TDB dataset simultaneously
> Me and Andy think the first could be achieved by making TDB datasets operation in auto-commit more rather than non-transactional mode by default.  In order to allow this we likely need upgradeable read transactions to be supported.  As part of this change non-transactional mode would still be supported but users would have to explicitly set some "Here be Dragons" style flag in order to do this.  Users who aren't using transactions currently would likely merely see performance drop since suddenly they are getting auto-commits on every operation but when they complain we can tell them they should be using transactions properly to ensure their TDB databases remain uncorrupted.
> As far as the second point goes we could likely do this the way a lot of other applications do by having the code write a lock file to disk when a database is opened which contains the owning processes PID.  Whenever you go to open a database the presence of the lock file is checked for and if present the PID validated with the code refusing to open the database if the PIDs do not match.  There would likely need to be some code to cope with the case where the lock file gets left around and the owning PID is not alive but that shouldn't be too complicated.
> Since these may be considered as substantial behavioural changes to TDB these may likely go into Jena 3



--
This message was sent by Atlassian JIRA
(v6.2#6252)