You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Jason Levitt <sl...@gmail.com> on 2015/08/24 06:51:38 UTC
Fuseki 2 HA or on-the-fly backups?
Just wondering if there are any projects out there
to provide:
1) HA (high availability) configuration of Fuseki such
as mirroring or hot/standby failover.
2) Some kind of on-the-fly backup of Fuseki when it's
running in RAM. This might be similar to how Hadoop
1.x "checkpoints" the in-RAM namenode data structures.
BTW, are there any tools for testing the consistency of the Fuseki
data structures when Fuseki is temporarily halted?
Cheers,
Jason
Re: Fuseki 2 HA or on-the-fly backups?
Posted by Andy Seaborne <an...@apache.org>.
On 24/08/15 05:51, Jason Levitt wrote:
> Just wondering if there are any projects out there
> to provide:
>
> 1) HA (high availability) configuration of Fuseki such
> as mirroring or hot/standby failover.
Some organisations achieve this by running a load balancer in front of
several replicas then co-ordinating the update process.
There is an experimental system that can do provide ACID consistency
across multiple servers - this is not part of Apache Jena.
https://github.com/afs/lizard
It uses Fuseki for the SPARQL protocol front-end.
It depends on TDB2 (in https://github.com/afs/mantis) which is upwardly
compatible with TDB databases but not downward.
Once you run TDB2 on a database, going back is not easy (in fact, TDB1
will see the database at the switchover sate, no later updates, then
corrupt the database if TDB1 does an update.)
TDB2 has scalable transactions - load a 100e6 triples into a live
server, delete vast amounts of the database etc etc.
Lizard and TDB2 are "experimental". In TDB2 full space recovery has been
written but not integrated into the codebase.
> 2) Some kind of on-the-fly backup of Fuseki when it's
> running in RAM. This might be similar to how Hadoop
> 1.x "checkpoints" the in-RAM namenode data structures.
You can a live backup:
POST /$/backup/*{name}*
https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html
>
> BTW, are there any tools for testing the consistency of the Fuseki
> data structures when Fuseki is temporarily halted?
Not sure what "temporarily" means here - the server is either running or
not running. You can't, for example, suspend (SIGSTOP/SIGCONT) the Java
process and have the database on-disk guaranteed to be in sync-with the
running server. It will be transactionally safe though.
Taking a backup from a running server is one way to check a database.
Not a perfect check, it uses only one index to do it - but it catches
node table breakage which is the usual first problem point when TDB has
been run non-transactionally and not shutdown cleanly. (Hint - never do
that, use transactions - TDB2 only supports transactions.)
>
> Cheers,
>
> Jason
>
Andy
Re: Fuseki 2 HA or on-the-fly backups?
Posted by Andy Seaborne <an...@apache.org>.
On 24/08/15 16:15, Jason Levitt wrote:
> Great info, thanks.
>
>> Some organisations achieve this by running a load balancer in front of
>> several replicas then co-ordinating the update process.
>
> So, they're running the same query against other nodes behind the load
> balancer to keep things in sync?
>
>> You can do a live backup
>
> So, an HTTP POST /$/backup/*{name}* initiates a backup and that
> results in a "gzip-compressed N-Quads file".
>
> What does a "restore" look like from that file?
You just load it into an empty database (tdbloader etc).
Andy
>
> -J
>
>
>
>
> On Mon, Aug 24, 2015 at 4:08 AM, Rob Vesse <rv...@dotnetrdf.org> wrote:
>> Andy already answered 1 but more on 2
>>
>> Assuming you use TDB then in-memory checkpointing already happens. TDB
>> caches data into memory but fundamentally is a persistent disk backed
>> database that uses write-ahead logging for transactions and failure
>> recovery so this already happens automatically and is below the level of
>> Fuseki (you get this behaviour wherever you use TDB provided you use it
>> transactionally which Fuseki always does)
>>
>> Rob
>>
>> On 24/08/2015 05:51, "Jason Levitt" <sl...@gmail.com> wrote:
>>
>>> Just wondering if there are any projects out there
>>> to provide:
>>>
>>> 1) HA (high availability) configuration of Fuseki such
>>> as mirroring or hot/standby failover.
>>>
>>> 2) Some kind of on-the-fly backup of Fuseki when it's
>>> running in RAM. This might be similar to how Hadoop
>>> 1.x "checkpoints" the in-RAM namenode data structures.
>>>
>>> BTW, are there any tools for testing the consistency of the Fuseki
>>> data structures when Fuseki is temporarily halted?
>>>
>>> Cheers,
>>>
>>> Jason
>>
>>
>>
>>
Re: Fuseki 2 HA or on-the-fly backups?
Posted by Jason Levitt <sl...@gmail.com>.
Great info, thanks.
> Some organisations achieve this by running a load balancer in front of
> several replicas then co-ordinating the update process.
So, they're running the same query against other nodes behind the load
balancer to keep things in sync?
> You can do a live backup
So, an HTTP POST /$/backup/*{name}* initiates a backup and that
results in a "gzip-compressed N-Quads file".
What does a "restore" look like from that file?
-J
On Mon, Aug 24, 2015 at 4:08 AM, Rob Vesse <rv...@dotnetrdf.org> wrote:
> Andy already answered 1 but more on 2
>
> Assuming you use TDB then in-memory checkpointing already happens. TDB
> caches data into memory but fundamentally is a persistent disk backed
> database that uses write-ahead logging for transactions and failure
> recovery so this already happens automatically and is below the level of
> Fuseki (you get this behaviour wherever you use TDB provided you use it
> transactionally which Fuseki always does)
>
> Rob
>
> On 24/08/2015 05:51, "Jason Levitt" <sl...@gmail.com> wrote:
>
>>Just wondering if there are any projects out there
>>to provide:
>>
>>1) HA (high availability) configuration of Fuseki such
>> as mirroring or hot/standby failover.
>>
>>2) Some kind of on-the-fly backup of Fuseki when it's
>> running in RAM. This might be similar to how Hadoop
>> 1.x "checkpoints" the in-RAM namenode data structures.
>>
>>BTW, are there any tools for testing the consistency of the Fuseki
>>data structures when Fuseki is temporarily halted?
>>
>>Cheers,
>>
>>Jason
>
>
>
>
Re: Fuseki 2 HA or on-the-fly backups?
Posted by Rob Vesse <rv...@dotnetrdf.org>.
Andy already answered 1 but more on 2
Assuming you use TDB then in-memory checkpointing already happens. TDB
caches data into memory but fundamentally is a persistent disk backed
database that uses write-ahead logging for transactions and failure
recovery so this already happens automatically and is below the level of
Fuseki (you get this behaviour wherever you use TDB provided you use it
transactionally which Fuseki always does)
Rob
On 24/08/2015 05:51, "Jason Levitt" <sl...@gmail.com> wrote:
>Just wondering if there are any projects out there
>to provide:
>
>1) HA (high availability) configuration of Fuseki such
> as mirroring or hot/standby failover.
>
>2) Some kind of on-the-fly backup of Fuseki when it's
> running in RAM. This might be similar to how Hadoop
> 1.x "checkpoints" the in-RAM namenode data structures.
>
>BTW, are there any tools for testing the consistency of the Fuseki
>data structures when Fuseki is temporarily halted?
>
>Cheers,
>
>Jason