You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Sarven Capadisli <in...@csarven.ca> on 2013/01/27 16:30:09 UTC
Fuseki service with multiple TDB datasets
Hi,
Is it possible to run a single Fuseki server with multiple TDB datasets
such that the queries span over those datasets?
For instance, a query performed at service A/query (with dataset A) may
also contain results from dataset B. And vice-versa if query performed
at service B/query.
I realize that it may be a bit too much to ask for something like this
(and probably not possible), but my idea was to use separate databases
so that I can update them independently with a lot of flexibility,
meanwhile having the SPARQL endpoint(s) see all of the data without the
need for federated queries.
If I use the same fuseki:name for each fuseki:Service, the last one
takes over and the query result reveals data only from its dataset.
Thanks,
-Sarven
http://csarven.ca/#i
Re: Fuseki service with multiple TDB datasets
Posted by Andy Seaborne <an...@apache.org>.
On 27/01/13 22:49, Sarven Capadisli wrote:
> On 01/27/2013 11:01 PM, Andy Seaborne wrote:
>> On 27/01/13 15:30, Sarven Capadisli wrote:
>>> Hi,
>>>
>>> Is it possible to run a single Fuseki server with multiple TDB datasets
>>> such that the queries span over those datasets?
>>
>> The short answer is "no".
>>
>> The long answer is that a union dataset could be developed that gave a
>> dataset view of 2+ sub-datasets. It would not be able to push down full
>> queries so there are performance implications at scale but for many uses
>> it might be useful.
>>
>> SERVICE is the nearest it gets.
>>
>> (But why not use one dataset with careful named graphs? You can use GSP
>> (Graph Store Protocol) to manage data which is convenient.
>
> I am using named graphs. I understand that it can work that way,
> however, I find it inconvenient to drop or update graphs - and I can't
> say that I generally have a /good/ response rate in comparison to simply
> rebuilding the store. Having separate databases also makes it easier to
> hotswap.
>
>>> For instance, a query performed at service A/query (with dataset A) may
>>> also contain results from dataset B. And vice-versa if query performed
>>> at service B/query.
>>>
>>> I realize that it may be a bit too much to ask for something like this
>>> (and probably not possible), but my idea was to use separate databases
>>> so that I can update them independently with a lot of flexibility,
>>> meanwhile having the SPARQL endpoint(s) see all of the data without the
>>> need for federated queries.
>>
>> What happens (= what does it mean) if both datasets have the same named
>> graph?
>
> I've just tried that and it looks like it had no effect. It still showed
> the results from the second service.
>
>>> If I use the same fuseki:name for each fuseki:Service, the last one
>>> takes over and the query result reveals data only from its dataset.
>>
>> The names have to be different. Probably should have been a
>> warning/error if there is reused (patches welcome!).
>
> Gave no warning or error.
No - it won't "probably should" was meant to say "the code needs fixing
to make it issue a warning"
At the moment, don't.
The last service will install as the actual one but "last" maybe last
when iterating over an internal hash table.
>
> Here is what I currently have in my TDB assembler:
>
> https://gist.github.com/4650969
>
> I was using the same fuseki:name in order to later have a single
> RewriteRule which points to the same query service e.g.,
> http://localhost:3030/data/query to be used from different SPARQL
> endpoints. But I guess that's not going to happen in any case.
>
> If you'd like me to generate a particular logging for this or force a
> warning/error somehow, please let me know how to go about that and I'll
> report back.
Patches to JIRA is the best for me.
Probably somewhere near "FusekiConfig.configure" either a sanity
precheck or when adding to the server config object.
(at a guess - it's been a while since I went near that code)
Andy
>
>> You'll be wanting datasets-of-datasets next!
>
> Of course! Anything less would be uncivilized.
>
> -Sarven
>
Re: Fuseki service with multiple TDB datasets
Posted by Sarven Capadisli <in...@csarven.ca>.
On 01/27/2013 11:01 PM, Andy Seaborne wrote:
> On 27/01/13 15:30, Sarven Capadisli wrote:
>> Hi,
>>
>> Is it possible to run a single Fuseki server with multiple TDB datasets
>> such that the queries span over those datasets?
>
> The short answer is "no".
>
> The long answer is that a union dataset could be developed that gave a
> dataset view of 2+ sub-datasets. It would not be able to push down full
> queries so there are performance implications at scale but for many uses
> it might be useful.
>
> SERVICE is the nearest it gets.
>
> (But why not use one dataset with careful named graphs? You can use GSP
> (Graph Store Protocol) to manage data which is convenient.
I am using named graphs. I understand that it can work that way,
however, I find it inconvenient to drop or update graphs - and I can't
say that I generally have a /good/ response rate in comparison to simply
rebuilding the store. Having separate databases also makes it easier to
hotswap.
>> For instance, a query performed at service A/query (with dataset A) may
>> also contain results from dataset B. And vice-versa if query performed
>> at service B/query.
>>
>> I realize that it may be a bit too much to ask for something like this
>> (and probably not possible), but my idea was to use separate databases
>> so that I can update them independently with a lot of flexibility,
>> meanwhile having the SPARQL endpoint(s) see all of the data without the
>> need for federated queries.
>
> What happens (= what does it mean) if both datasets have the same named
> graph?
I've just tried that and it looks like it had no effect. It still showed
the results from the second service.
>> If I use the same fuseki:name for each fuseki:Service, the last one
>> takes over and the query result reveals data only from its dataset.
>
> The names have to be different. Probably should have been a
> warning/error if there is reused (patches welcome!).
Gave no warning or error.
Here is what I currently have in my TDB assembler:
https://gist.github.com/4650969
I was using the same fuseki:name in order to later have a single
RewriteRule which points to the same query service e.g.,
http://localhost:3030/data/query to be used from different SPARQL
endpoints. But I guess that's not going to happen in any case.
If you'd like me to generate a particular logging for this or force a
warning/error somehow, please let me know how to go about that and I'll
report back.
> You'll be wanting datasets-of-datasets next!
Of course! Anything less would be uncivilized.
-Sarven
Re: Fuseki service with multiple TDB datasets
Posted by Andy Seaborne <an...@apache.org>.
On 27/01/13 15:30, Sarven Capadisli wrote:
> Hi,
>
> Is it possible to run a single Fuseki server with multiple TDB datasets
> such that the queries span over those datasets?
The short answer is "no".
The long answer is that a union dataset could be developed that gave a
dataset view of 2+ sub-datasets. It would not be able to push down full
queries so there are performance implications at scale but for many uses
it might be useful.
SERVICE is the nearest it gets.
(But why not use one dataset with careful named graphs? You can use GSP
(Graph Store Protocol) to manage data which is convenient.
> For instance, a query performed at service A/query (with dataset A) may
> also contain results from dataset B. And vice-versa if query performed
> at service B/query.
>
> I realize that it may be a bit too much to ask for something like this
> (and probably not possible), but my idea was to use separate databases
> so that I can update them independently with a lot of flexibility,
> meanwhile having the SPARQL endpoint(s) see all of the data without the
> need for federated queries.
What happens (= what does it mean) if both datasets have the same named
graph?
> If I use the same fuseki:name for each fuseki:Service, the last one
> takes over and the query result reveals data only from its dataset.
The names have to be different. Probably should have been a
warning/error if there is reused (patches welcome!).
>
> Thanks,
>
> -Sarven
You'll be wanting datasets-of-datasets next!
Andy
>
> http://csarven.ca/#i
>