You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Sarven Capadisli <in...@csarven.ca> on 2013/01/27 16:30:09 UTC

Fuseki service with multiple TDB datasets

Hi,

Is it possible to run a single Fuseki server with multiple TDB datasets 
such that the queries span over those datasets?

For instance, a query performed at service A/query (with dataset A) may 
also contain results from dataset B. And vice-versa if query performed 
at service B/query.

I realize that it may be a bit too much to ask for something like this 
(and probably not possible), but my idea was to use separate databases 
so that I can update them independently with a lot of flexibility, 
meanwhile having the SPARQL endpoint(s) see all of the data without the 
need for federated queries.

If I use the same fuseki:name for each fuseki:Service, the last one 
takes over and the query result reveals data only from its dataset.

Thanks,

-Sarven

http://csarven.ca/#i

Re: Fuseki service with multiple TDB datasets

Posted by Andy Seaborne <an...@apache.org>.

On 27/01/13 22:49, Sarven Capadisli wrote:
> On 01/27/2013 11:01 PM, Andy Seaborne wrote:
>> On 27/01/13 15:30, Sarven Capadisli wrote:
>>> Hi,
>>>
>>> Is it possible to run a single Fuseki server with multiple TDB datasets
>>> such that the queries span over those datasets?
>>
>> The short answer is "no".
>>
>> The long answer is that a union dataset could be developed that gave a
>> dataset view of 2+ sub-datasets.  It would not be able to push down full
>> queries so there are performance implications at scale but for many uses
>> it might be useful.
>>
>> SERVICE is the nearest it gets.
>>
>> (But why not use one dataset with careful named graphs?  You can use GSP
>> (Graph Store Protocol) to manage data which is convenient.
>
> I am using named graphs. I understand that it can work that way,
> however, I find it inconvenient to drop or update graphs - and I can't
> say that I generally have a /good/ response rate in comparison to simply
> rebuilding the store. Having separate databases also makes it easier to
> hotswap.
>
>>> For instance, a query performed at service A/query (with dataset A) may
>>> also contain results from dataset B. And vice-versa if query performed
>>> at service B/query.
>>>
>>> I realize that it may be a bit too much to ask for something like this
>>> (and probably not possible), but my idea was to use separate databases
>>> so that I can update them independently with a lot of flexibility,
>>> meanwhile having the SPARQL endpoint(s) see all of the data without the
>>> need for federated queries.
>>
>> What happens (= what does it mean) if both datasets have the same named
>> graph?
>
> I've just tried that and it looks like it had no effect. It still showed
> the results from the second service.
>
>>> If I use the same fuseki:name for each fuseki:Service, the last one
>>> takes over and the query result reveals data only from its dataset.
>>
>> The names have to be different.  Probably should have been a
>> warning/error if there is reused (patches welcome!).
>
> Gave no warning or error.

No - it won't "probably should" was meant to say "the code needs fixing 
to make it issue a warning"

At the moment, don't.

The last service will install as the actual one but "last" maybe last 
when iterating over an internal hash table.

>
> Here is what I currently have in my TDB assembler:
>
> https://gist.github.com/4650969
>
> I was using the same fuseki:name in order to later have a single
> RewriteRule which points to the same query service e.g.,
> http://localhost:3030/data/query to be used from different SPARQL
> endpoints. But I guess that's not going to happen in any case.
>
> If you'd like me to generate a particular logging for this or force a
> warning/error somehow, please let me know how to go about that and I'll
> report back.

Patches to JIRA is the best for me.

Probably somewhere near "FusekiConfig.configure" either a sanity 
precheck or when adding to the server config object.
(at a guess - it's been a while since I went near that code)

	Andy

>
>> You'll be wanting datasets-of-datasets next!
>
> Of course! Anything less would be uncivilized.
>
> -Sarven
>

Re: Fuseki service with multiple TDB datasets

Posted by Sarven Capadisli <in...@csarven.ca>.

On 01/27/2013 11:01 PM, Andy Seaborne wrote:
> On 27/01/13 15:30, Sarven Capadisli wrote:
>> Hi,
>>
>> Is it possible to run a single Fuseki server with multiple TDB datasets
>> such that the queries span over those datasets?
>
> The short answer is "no".
>
> The long answer is that a union dataset could be developed that gave a
> dataset view of 2+ sub-datasets.  It would not be able to push down full
> queries so there are performance implications at scale but for many uses
> it might be useful.
>
> SERVICE is the nearest it gets.
>
> (But why not use one dataset with careful named graphs?  You can use GSP
> (Graph Store Protocol) to manage data which is convenient.

I am using named graphs. I understand that it can work that way, 
however, I find it inconvenient to drop or update graphs - and I can't 
say that I generally have a /good/ response rate in comparison to simply 
rebuilding the store. Having separate databases also makes it easier to 
hotswap.

>> For instance, a query performed at service A/query (with dataset A) may
>> also contain results from dataset B. And vice-versa if query performed
>> at service B/query.
>>
>> I realize that it may be a bit too much to ask for something like this
>> (and probably not possible), but my idea was to use separate databases
>> so that I can update them independently with a lot of flexibility,
>> meanwhile having the SPARQL endpoint(s) see all of the data without the
>> need for federated queries.
>
> What happens (= what does it mean) if both datasets have the same named
> graph?

I've just tried that and it looks like it had no effect. It still showed 
the results from the second service.

>> If I use the same fuseki:name for each fuseki:Service, the last one
>> takes over and the query result reveals data only from its dataset.
>
> The names have to be different.  Probably should have been a
> warning/error if there is reused (patches welcome!).

Gave no warning or error.

Here is what I currently have in my TDB assembler:

https://gist.github.com/4650969

I was using the same fuseki:name in order to later have a single 
RewriteRule which points to the same query service e.g., 
http://localhost:3030/data/query to be used from different SPARQL 
endpoints. But I guess that's not going to happen in any case.

If you'd like me to generate a particular logging for this or force a 
warning/error somehow, please let me know how to go about that and I'll 
report back.

> You'll be wanting datasets-of-datasets next!

Of course! Anything less would be uncivilized.

-Sarven

Re: Fuseki service with multiple TDB datasets

Posted by Andy Seaborne <an...@apache.org>.

On 27/01/13 15:30, Sarven Capadisli wrote:
> Hi,
>
> Is it possible to run a single Fuseki server with multiple TDB datasets
> such that the queries span over those datasets?

The short answer is "no".

The long answer is that a union dataset could be developed that gave a 
dataset view of 2+ sub-datasets.  It would not be able to push down full 
queries so there are performance implications at scale but for many uses 
it might be useful.

SERVICE is the nearest it gets.

(But why not use one dataset with careful named graphs?  You can use GSP 
(Graph Store Protocol) to manage data which is convenient.

> For instance, a query performed at service A/query (with dataset A) may
> also contain results from dataset B. And vice-versa if query performed
> at service B/query.
>
> I realize that it may be a bit too much to ask for something like this
> (and probably not possible), but my idea was to use separate databases
> so that I can update them independently with a lot of flexibility,
> meanwhile having the SPARQL endpoint(s) see all of the data without the
> need for federated queries.

What happens (= what does it mean) if both datasets have the same named 
graph?

> If I use the same fuseki:name for each fuseki:Service, the last one
> takes over and the query result reveals data only from its dataset.

The names have to be different.  Probably should have been a 
warning/error if there is reused (patches welcome!).

>
> Thanks,
>
> -Sarven

You'll be wanting datasets-of-datasets next!

	Andy

>
> http://csarven.ca/#i
>