You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Bill Roberts <bi...@swirrl.com> on 2011/09/09 08:55:47 UTC

practical max TDBs per Joseki/Fuseki instance?

I'm investigating some possibilities around an app that would deal with personal data for a large number of people, where it is important that users can't see data without authorisation.  My first thought was to put everything into one TDB and select data for individual users via SPARQL, hiding the SPARQL endpoint behind an 'outer'  API layer that would manage authentication etc. - a 'classic' database backed web-app approach.

One other possibility might be to have a separate TDB per person, which could then allow a SPARQL endpoint for each TDB to be exposed to the outside world (with authentication of some sort).  But there could be a requirement for a large number of TDBs. 

I'm just wondering what architectural or server-resource limits there might be if trying to run an instance of Joseki or Fuseki with a very large number of different services, each with their own URL and their own instance TDB store?

The other alternative is to have each person's data in a separate file and just query that using SELECT FROM. Data volume per person might be small enough to make that do-able.  

Thanks for any insight on this!

Bill

Re: practical max TDBs per Joseki/Fuseki instance?

Posted by Damian Steer <d....@bristol.ac.uk>.

Doing this on my phone. Apologies for any mistakes.

Sent from my iPhone

On 9 Sep 2011, at 08:43, Paolo Castagna <ca...@googlemail.com> wrote:

> There is something interesting here: http://openjena.org/wiki/TDB/QuadFilter if you need to work at the lowest level excluding just a few triples. I've never tried/use myself (yet).
> The easier option could be to have different datasets in Fuseki and use something in front of it for authorization/authentication.

In caboto we enforce graph level security [1] by query rewriting to FILTER every GRAPH. It's much like the TDB filter but will work over any sparql store, although default union can't be handled.

We also use a query distributor called arnos which could give you the beginnings of a sparql graph sharding scheme, I suppose.

Damian

[1] <http://code.google.com/p/caboto/source/browse/caboto/src/main/java/org/caboto/security/sparql>

RE: practical max TDBs per Joseki/Fuseki instance?

Posted by David Jordan <Da...@sas.com>.

I may (but not certain yet) also need to support an authorization scheme to control what users see.

-----Original Message-----
From: Paolo Castagna [mailto:castagna.lists@googlemail.com] 
Sent: Friday, September 09, 2011 3:44 AM
To: jena-users@incubator.apache.org
Subject: Re: practical max TDBs per Joseki/Fuseki instance?

Hi Bill,
I don't have practical/useful suggestions to add to your list nor I have a particular recommendation on this at the moment.
I just want to add myself to the list of people interested on this, it's really important and critical for all wanting to serve a SPARQL endpoint (public|private) with some sort of authentication/authorization.

There is something interesting here: http://openjena.org/wiki/TDB/QuadFilter if you need to work at the lowest level excluding just a few triples. I've never tried/use myself (yet).
The easier option could be to have different datasets in Fuseki and use something in front of it for authorization/authentication.

Re: requirement for a large number of TDBs Once, again, I am clearly interested on this. Maybe one possible thing to share between different TDBs would be the node table and the graph/prefix/uri table but it is not possible at the moment.

Is it a coincidence we have so many similar needs/requirements or these are just common needs for all the people wanting to use Fuseki in production? ;-)

Sorry for not being more helpful, but you are not alone.

Paolo

Bill Roberts wrote:
> I'm investigating some possibilities around an app that would deal with personal data for a large number of people, where it is important that users can't see data without authorisation.  My first thought was to put everything into one TDB and select data for individual users via SPARQL, hiding the SPARQL endpoint behind an 'outer'  API layer that would manage authentication etc. - a 'classic' database backed web-app approach.
> 
> One other possibility might be to have a separate TDB per person, which could then allow a SPARQL endpoint for each TDB to be exposed to the outside world (with authentication of some sort).  But there could be a requirement for a large number of TDBs. 

> 
> I'm just wondering what architectural or server-resource limits there might be if trying to run an instance of Joseki or Fuseki with a very large number of different services, each with their own URL and their own instance TDB store?
> 
> The other alternative is to have each person's data in a separate file and just query that using SELECT FROM. Data volume per person might be small enough to make that do-able.  
> 
> Thanks for any insight on this!
> 
> Bill

Re: practical max TDBs per Joseki/Fuseki instance?

Posted by Paolo Castagna <ca...@googlemail.com>.

Hi Bill,
I don't have practical/useful suggestions to add to your list nor I have a particular recommendation on this at the moment.
I just want to add myself to the list of people interested on this, it's really important and critical for all wanting to serve a SPARQL endpoint (public|private) with some sort of
authentication/authorization.

There is something interesting here: http://openjena.org/wiki/TDB/QuadFilter if you need to work at the lowest level excluding just a few triples. I've never tried/use myself (yet).
The easier option could be to have different datasets in Fuseki and use something in front of it for authorization/authentication.

Re: requirement for a large number of TDBs
Once, again, I am clearly interested on this. Maybe one possible thing to share between different TDBs would be the node table and the graph/prefix/uri table but it is not possible at the moment.

Is it a coincidence we have so many similar needs/requirements or these are just common needs for all the people wanting to use Fuseki in production? ;-)

Sorry for not being more helpful, but you are not alone.

Paolo

Bill Roberts wrote:
> I'm investigating some possibilities around an app that would deal with personal data for a large number of people, where it is important that users can't see data without authorisation.  My first thought was to put everything into one TDB and select data for individual users via SPARQL, hiding the SPARQL endpoint behind an 'outer'  API layer that would manage authentication etc. - a 'classic' database backed web-app approach.
> 
> One other possibility might be to have a separate TDB per person, which could then allow a SPARQL endpoint for each TDB to be exposed to the outside world (with authentication of some sort).  But there could be a requirement for a large number of TDBs. 
> 
> I'm just wondering what architectural or server-resource limits there might be if trying to run an instance of Joseki or Fuseki with a very large number of different services, each with their own URL and their own instance TDB store?
> 
> The other alternative is to have each person's data in a separate file and just query that using SELECT FROM. Data volume per person might be small enough to make that do-able.  
> 
> Thanks for any insight on this!
> 
> Bill