You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Jason Levitt <sl...@gmail.com> on 2015/08/24 06:51:38 UTC

Fuseki 2 HA or on-the-fly backups?

Just wondering if there are any projects out there
to provide:

1) HA (high availability) configuration of Fuseki such
    as mirroring or hot/standby failover.

2) Some kind of on-the-fly backup of Fuseki when it's
    running in RAM. This might be similar to how Hadoop
    1.x "checkpoints" the in-RAM namenode data structures.

BTW, are there any tools for testing the consistency of the Fuseki
data structures when Fuseki is temporarily halted?

Cheers,

Jason

Re: Fuseki 2 HA or on-the-fly backups?

Posted by Andy Seaborne <an...@apache.org>.

On 24/08/15 05:51, Jason Levitt wrote:
> Just wondering if there are any projects out there
> to provide:
>
> 1) HA (high availability) configuration of Fuseki such
>      as mirroring or hot/standby failover.

Some organisations achieve this by running a load balancer in front of 
several replicas then co-ordinating the update process.

There is an experimental system that can do provide ACID consistency 
across multiple servers - this is not part of Apache Jena.

https://github.com/afs/lizard

It uses Fuseki for the SPARQL protocol front-end.

It depends on TDB2 (in https://github.com/afs/mantis) which is upwardly 
compatible with TDB databases but not downward.

Once you run TDB2 on a database, going back is not easy (in fact, TDB1 
will see the database at the switchover sate, no later updates, then 
corrupt the database if TDB1 does an update.)

TDB2 has scalable transactions - load a 100e6 triples into a live 
server, delete vast amounts of the database etc etc.

Lizard and TDB2 are "experimental". In TDB2 full space recovery has been 
written but not integrated into the codebase.

> 2) Some kind of on-the-fly backup of Fuseki when it's
>      running in RAM. This might be similar to how Hadoop
>      1.x "checkpoints" the in-RAM namenode data structures.

You can a live backup:

POST /$/backup/*{name}*

https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html

>
> BTW, are there any tools for testing the consistency of the Fuseki
> data structures when Fuseki is temporarily halted?

Not sure what "temporarily" means here - the server is either running or 
not running.  You can't, for example, suspend (SIGSTOP/SIGCONT) the Java 
process and have the database on-disk guaranteed to be in sync-with the 
running server.  It will be transactionally safe though.

Taking a backup from a running server is one way to check a database. 
Not a perfect check, it uses only one index to do it - but it catches 
node table breakage which is the usual first problem point when TDB has 
been run non-transactionally and not shutdown cleanly.  (Hint - never do 
that, use transactions - TDB2 only supports transactions.)

>
> Cheers,
>
> Jason
>	

	Andy

Re: Fuseki 2 HA or on-the-fly backups?

Posted by Andy Seaborne <an...@apache.org>.

On 24/08/15 16:15, Jason Levitt wrote:
> Great info, thanks.
>
>> Some organisations achieve this by running a load balancer in front of
>> several replicas then co-ordinating the update process.
>
> So, they're running the same query against other nodes behind the load
> balancer to keep things in sync?
>
>> You can do a live backup
>
> So, an HTTP POST /$/backup/*{name}*  initiates a backup and that
> results in a "gzip-compressed N-Quads file".
>
> What does a "restore" look like from that file?

You just load it into an empty database (tdbloader etc).

	Andy

>
> -J
>
>
>
>
> On Mon, Aug 24, 2015 at 4:08 AM, Rob Vesse <rv...@dotnetrdf.org> wrote:
>> Andy already answered 1 but more on 2
>>
>> Assuming you use TDB then in-memory checkpointing already happens.  TDB
>> caches data into memory but fundamentally is a persistent disk backed
>> database that uses write-ahead logging for transactions and failure
>> recovery so this already happens automatically and is below the level of
>> Fuseki (you get this behaviour wherever you use TDB provided you use it
>> transactionally which Fuseki always does)
>>
>> Rob
>>
>> On 24/08/2015 05:51, "Jason Levitt" <sl...@gmail.com> wrote:
>>
>>> Just wondering if there are any projects out there
>>> to provide:
>>>
>>> 1) HA (high availability) configuration of Fuseki such
>>>     as mirroring or hot/standby failover.
>>>
>>> 2) Some kind of on-the-fly backup of Fuseki when it's
>>>     running in RAM. This might be similar to how Hadoop
>>>     1.x "checkpoints" the in-RAM namenode data structures.
>>>
>>> BTW, are there any tools for testing the consistency of the Fuseki
>>> data structures when Fuseki is temporarily halted?
>>>
>>> Cheers,
>>>
>>> Jason
>>
>>
>>
>>

Re: Fuseki 2 HA or on-the-fly backups?

Posted by Jason Levitt <sl...@gmail.com>.

Great info, thanks.

> Some organisations achieve this by running a load balancer in front of
> several replicas then co-ordinating the update process.

So, they're running the same query against other nodes behind the load
balancer to keep things in sync?

> You can do a live backup

So, an HTTP POST /$/backup/*{name}*  initiates a backup and that
results in a "gzip-compressed N-Quads file".

What does a "restore" look like from that file?

-J




On Mon, Aug 24, 2015 at 4:08 AM, Rob Vesse <rv...@dotnetrdf.org> wrote:
> Andy already answered 1 but more on 2
>
> Assuming you use TDB then in-memory checkpointing already happens.  TDB
> caches data into memory but fundamentally is a persistent disk backed
> database that uses write-ahead logging for transactions and failure
> recovery so this already happens automatically and is below the level of
> Fuseki (you get this behaviour wherever you use TDB provided you use it
> transactionally which Fuseki always does)
>
> Rob
>
> On 24/08/2015 05:51, "Jason Levitt" <sl...@gmail.com> wrote:
>
>>Just wondering if there are any projects out there
>>to provide:
>>
>>1) HA (high availability) configuration of Fuseki such
>>    as mirroring or hot/standby failover.
>>
>>2) Some kind of on-the-fly backup of Fuseki when it's
>>    running in RAM. This might be similar to how Hadoop
>>    1.x "checkpoints" the in-RAM namenode data structures.
>>
>>BTW, are there any tools for testing the consistency of the Fuseki
>>data structures when Fuseki is temporarily halted?
>>
>>Cheers,
>>
>>Jason
>
>
>
>

Re: Fuseki 2 HA or on-the-fly backups?

Posted by Rob Vesse <rv...@dotnetrdf.org>.

Andy already answered 1 but more on 2

Assuming you use TDB then in-memory checkpointing already happens.  TDB
caches data into memory but fundamentally is a persistent disk backed
database that uses write-ahead logging for transactions and failure
recovery so this already happens automatically and is below the level of
Fuseki (you get this behaviour wherever you use TDB provided you use it
transactionally which Fuseki always does)

Rob

On 24/08/2015 05:51, "Jason Levitt" <sl...@gmail.com> wrote:

>Just wondering if there are any projects out there
>to provide:
>
>1) HA (high availability) configuration of Fuseki such
>    as mirroring or hot/standby failover.
>
>2) Some kind of on-the-fly backup of Fuseki when it's
>    running in RAM. This might be similar to how Hadoop
>    1.x "checkpoints" the in-RAM namenode data structures.
>
>BTW, are there any tools for testing the consistency of the Fuseki
>data structures when Fuseki is temporarily halted?
>
>Cheers,
>
>Jason