You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Simon Gray <si...@hum.ku.dk> on 2022/07/01 09:33:55 UTC

Save snapshot of schema cache?

Hi everyone, 

I love how Jena will download schemas on-demand from URLs, but one issue I have is that this feature is that it does not guarantee reproducibility e.g. when there is no Internet connection or if the schema server is offline.

I have downloaded some of the schemas I use and provide these locally, but occasionally Jena will still error out since not every schema exists locally so the remainder are still downloaded via a network connection. I was wondering if there is a simple way to persist a snapshot of the schema files downloaded by any one Jena instance, so that I do not have go and fetch all of these manually.

Kind regards,

Simon Gray

Software developer,
Centre for Language Technology,
University of Copenhagen

Re: Save snapshot of schema cache?

Posted by Dan Brickley <da...@danbri.org>.
On Fri, 1 Jul 2022 at 10:34, Simon Gray <si...@hum.ku.dk> wrote:

> Hi everyone,
>
> I love how Jena will download schemas on-demand from URLs, but one issue I
> have is that this feature is that it does not guarantee reproducibility
> e.g. when there is no Internet connection or if the schema server is
> offline.
>
> I have downloaded some of the schemas I use and provide these locally, but
> occasionally Jena will still error out since not every schema exists
> locally so the remainder are still downloaded via a network connection. I
> was wondering if there is a simple way to persist a snapshot of the schema
> files downloaded by any one Jena instance, so that I do not have go and
> fetch all of these manually.
>

Can I jump in here since our FOAF webserver had some (fairly unusual)
downtime at the weekend.

First - caching schema data is very sensible. Especially for schemas that
aren't changing very rapidly (Dublin Core, FOAF, SKOS, etc.).

Secondly - I want to look again at signing the RDF/S data files, so that
there could be an alternate path to verify that cached data is (or was)
correct and hasn't been manipulated.

It seems Apache uses OpenPGP (GPG etc.) for this,

https://infra.apache.org/release-signing.html
https://httpd.apache.org/dev/verification.html

Would folk here be interested? Is there an obvious way to engage with this
stuff in pure Java? (beyond compiling gpg to wasm etc.).

Dan




> Kind regards,
>
> Simon Gray
>
> Software developer,
> Centre for Language Technology,
> University of Copenhagen
>

Re: Save snapshot of schema cache?

Posted by Andy Seaborne <an...@apache.org>.
Simon,

A way to do this is to install a caching proxy and direct web traffic to 
that proxy. e.g a httpd or nginx setup.

Another way is to use the StreamManager/LocationMapper map the URls to a 
local file and manage that file (e.g. a regular script that updates the 
local copy. (Beware there are two LocationMapper - old one in core for 
(extreme!) legacy and the current one in jena-arq/RIOT).

     Andy


On 01/07/2022 10:33, Simon Gray wrote:
> Hi everyone,
> 
> I love how Jena will download schemas on-demand from URLs, but one issue I have is that this feature is that it does not guarantee reproducibility e.g. when there is no Internet connection or if the schema server is offline.
> 
> I have downloaded some of the schemas I use and provide these locally, but occasionally Jena will still error out since not every schema exists locally so the remainder are still downloaded via a network connection. I was wondering if there is a simple way to persist a snapshot of the schema files downloaded by any one Jena instance, so that I do not have go and fetch all of these manually.
> 
> Kind regards,
> 
> Simon Gray
> 
> Software developer,
> Centre for Language Technology,
> University of Copenhagen