You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Tim Flicker <ti...@topquadrant.com> on 2020/06/08 13:33:26 UTC

Data backup and restore

Hi Jena Community,

I'm working on an auto data backup and restore feature for our platform 
which uses Jena for data access (gTDB and xTDB). The requirement is to 
have the application up during the backup operation although it can be 
taken down for restore. I've been looking into the tdbbackup and 
tdbloader scripts that come packaged with Jena.

Is it safe to run the tdbbackup script while graphs are being access 
either read or write? An alternative approach would be to 
programmatically place lock files which seems like it would put the 
system in "read only" mode. Once the lock files are in place, I could do 
a backup at the file system level then remove the locks once complete.

Any advice on how to proceed with this operation safely is greatly 
appreciated.

Regards,
Tim

Re: Data backup and restore

Posted by Andy Seaborne <an...@apache.org>.

On 09/06/2020 15:25, Tim Flicker wrote:
> Hi Andy,
> 
> Thanks for the response.
> 
> The plan is to implement new endpoints server side to do backup and 
> restore. The backup process will run in the same JVM as the server and 
> uses the same logic that is implemented in 
> org.apache.jena.tdb.TDBBackup.backup(...). My only concern is if this 
> operation is safe for both storage types while the application is 
> running and updates are potentially still occurring during the backup 
> process.

Yes - backup runs inside a "read" transaction.

> 
> The backup process will also grab all connector files.

To be robust, consider locking so connector files can't created or 
deleted while the back of the database runs.

> For restore, the plan is to take the application offline to ensure data 
> integrity.
> 
> Regards,
> Tim
> 
> On 6/8/2020 5:50 PM, Andy Seaborne wrote:
>> Hi Tim,
>>
>> Some context for our readers: gTDB and xTDB are different ways of 
>> using TDB. Last I heard, it was TDB1, not that makes very much 
>> difference here.
>>
>> gTDB - one graph stored in the default graph of a TDB database.
>>    Many graphs, many databases.
>> xTDB - single, shared TDB database with graphs stored as named graphs.
>>
>> On 08/06/2020 14:33, Tim Flicker wrote:
>>> Hi Jena Community,
>>>
>>> I'm working on an auto data backup and restore feature for our 
>>> platform which uses Jena for data access (gTDB and xTDB). The 
>>> requirement is to have the application up during the backup operation 
>>> although it can be taken down for restore. I've been looking into the 
>>> tdbbackup and tdbloader scripts that come packaged with Jena.
>>
>> Only one process can access a TDB database at a time so when the 
>> server is running, only it can use the database.
>>
>> Live backup of databases has to be done by the server - that's what is 
>> done by the backup servlet [1].
>>
>> The backup could be written to local disk or delivered over HTTP.
>>
>> curl -v -XPOST 'http://localhost:8080/tbl/backup?storage=xdb' --output 
>> test.trig
>>
>> writes the entire database as a single TriG file.
>>
>> This backup is a single transactional snapshot of the database so the 
>> data is consistent even if changes are also being made.
>>
>> gTDB is harder because the graphs are in many databases. There isn't 
>> an easy way to backup all the graph at once without additional code to 
>> take a lock or transaction on each database - that's something outside 
>> of TDB.
>>
>>
>> Restore is either built separately, stop the server and install or 
>> write to the database from inside a running server.
>>
>> That is for the TDB databases - in your situation, there are also the 
>> configuration in disk files (connector files) that go with the graphs 
>> - they aren't in TDB so these backup procedures aren't going to be 
>> enough on their own. It is a problem if graphs have been added or 
>> deleted from the system between backup and restore.
>>
>>> Is it safe to run the tdbbackup script while graphs are being access 
>>> either read or write? 
>>
>> No.
>> In fact, it should refuse to do it.
>>
>>> An alternative approach would be to programmatically place lock files 
>>> which seems like it would put the system in "read only" mode.
>>
>> The TDB lock files control exclusive access, not a read/write mode.
>>
>>> Once the lock files are in place, I could do a backup at the file 
>>> system level then remove the locks once complete.
>>
>> Parallel operation with the server is possible with transactions, 
>> including having writers while a TDB backup is taken - the backup will 
>> not see the changes, only the data in the database as it was when teh 
>> backup started.
>>
>>> Any advice on how to proceed with this operation safely is greatly 
>>> appreciated.
>>>
>>> Regards,
>>> Tim
>>
>>     Hope that helps,
>>     Andy
>>
>> [1] 
>> https://doc.topquadrant.com/6.3/backup-and-restore/#Live_Data_Backup_of_a_Shared_Graph_TDB 
>>
>>
> 

Re: Data backup and restore

Posted by Tim Flicker <ti...@topquadrant.com>.
Hi Andy,

Thanks for the response.

The plan is to implement new endpoints server side to do backup and 
restore. The backup process will run in the same JVM as the server and 
uses the same logic that is implemented in 
org.apache.jena.tdb.TDBBackup.backup(...). My only concern is if this 
operation is safe for both storage types while the application is 
running and updates are potentially still occurring during the backup 
process.

The backup process will also grab all connector files.

For restore, the plan is to take the application offline to ensure data 
integrity.

Regards,
Tim

On 6/8/2020 5:50 PM, Andy Seaborne wrote:
> Hi Tim,
>
> Some context for our readers: gTDB and xTDB are different ways of 
> using TDB. Last I heard, it was TDB1, not that makes very much 
> difference here.
>
> gTDB - one graph stored in the default graph of a TDB database.
>    Many graphs, many databases.
> xTDB - single, shared TDB database with graphs stored as named graphs.
>
> On 08/06/2020 14:33, Tim Flicker wrote:
>> Hi Jena Community,
>>
>> I'm working on an auto data backup and restore feature for our 
>> platform which uses Jena for data access (gTDB and xTDB). The 
>> requirement is to have the application up during the backup operation 
>> although it can be taken down for restore. I've been looking into the 
>> tdbbackup and tdbloader scripts that come packaged with Jena.
>
> Only one process can access a TDB database at a time so when the 
> server is running, only it can use the database.
>
> Live backup of databases has to be done by the server - that's what is 
> done by the backup servlet [1].
>
> The backup could be written to local disk or delivered over HTTP.
>
> curl -v -XPOST 'http://localhost:8080/tbl/backup?storage=xdb' --output 
> test.trig
>
> writes the entire database as a single TriG file.
>
> This backup is a single transactional snapshot of the database so the 
> data is consistent even if changes are also being made.
>
> gTDB is harder because the graphs are in many databases. There isn't 
> an easy way to backup all the graph at once without additional code to 
> take a lock or transaction on each database - that's something outside 
> of TDB.
>
>
> Restore is either built separately, stop the server and install or 
> write to the database from inside a running server.
>
> That is for the TDB databases - in your situation, there are also the 
> configuration in disk files (connector files) that go with the graphs 
> - they aren't in TDB so these backup procedures aren't going to be 
> enough on their own. It is a problem if graphs have been added or 
> deleted from the system between backup and restore.
>
>> Is it safe to run the tdbbackup script while graphs are being access 
>> either read or write? 
>
> No.
> In fact, it should refuse to do it.
>
>> An alternative approach would be to programmatically place lock files 
>> which seems like it would put the system in "read only" mode.
>
> The TDB lock files control exclusive access, not a read/write mode.
>
>> Once the lock files are in place, I could do a backup at the file 
>> system level then remove the locks once complete.
>
> Parallel operation with the server is possible with transactions, 
> including having writers while a TDB backup is taken - the backup will 
> not see the changes, only the data in the database as it was when teh 
> backup started.
>
>> Any advice on how to proceed with this operation safely is greatly 
>> appreciated.
>>
>> Regards,
>> Tim
>
>     Hope that helps,
>     Andy
>
> [1] 
> https://doc.topquadrant.com/6.3/backup-and-restore/#Live_Data_Backup_of_a_Shared_Graph_TDB
>


Re: Data backup and restore

Posted by Andy Seaborne <an...@apache.org>.
Hi Tim,

Some context for our readers: gTDB and xTDB are different ways of using 
TDB. Last I heard, it was TDB1, not that makes very much difference here.

gTDB - one graph stored in the default graph of a TDB database.
    Many graphs, many databases.
xTDB - single, shared TDB database with graphs stored as named graphs.

On 08/06/2020 14:33, Tim Flicker wrote:
> Hi Jena Community,
> 
> I'm working on an auto data backup and restore feature for our platform 
> which uses Jena for data access (gTDB and xTDB). The requirement is to 
> have the application up during the backup operation although it can be 
> taken down for restore. I've been looking into the tdbbackup and 
> tdbloader scripts that come packaged with Jena.

Only one process can access a TDB database at a time so when the server 
is running, only it can use the database.

Live backup of databases has to be done by the server - that's what is 
done by the backup servlet [1].

The backup could be written to local disk or delivered over HTTP.

curl -v -XPOST 'http://localhost:8080/tbl/backup?storage=xdb' --output 
test.trig

writes the entire database as a single TriG file.

This backup is a single transactional snapshot of the database so the 
data is consistent even if changes are also being made.

gTDB is harder because the graphs are in many databases. There isn't an 
easy way to backup all the graph at once without additional code to take 
a lock or transaction on each database - that's something outside of TDB.


Restore is either built separately, stop the server and install or write 
to the database from inside a running server.

That is for the TDB databases - in your situation, there are also the 
configuration in disk files (connector files) that go with the graphs - 
they aren't in TDB so these backup procedures aren't going to be enough 
on their own. It is a problem if graphs have been added or deleted from 
the system between backup and restore.

> Is it safe to run the tdbbackup script while graphs are being access 
> either read or write? 

No.
In fact, it should refuse to do it.

> An alternative approach would be to 
> programmatically place lock files which seems like it would put the 
> system in "read only" mode.

The TDB lock files control exclusive access, not a read/write mode.

> Once the lock files are in place, I could do 
> a backup at the file system level then remove the locks once complete.

Parallel operation with the server is possible with transactions, 
including having writers while a TDB backup is taken - the backup will 
not see the changes, only the data in the database as it was when teh 
backup started.

> Any advice on how to proceed with this operation safely is greatly 
> appreciated.
> 
> Regards,
> Tim

     Hope that helps,
     Andy

[1] 
https://doc.topquadrant.com/6.3/backup-and-restore/#Live_Data_Backup_of_a_Shared_Graph_TDB