You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Dave Reynolds <da...@gmail.com> on 2017/04/02 08:06:34 UTC

Re: persistent inference on named graphs in Fuseki

[Sorry to be slow to respond, I was hoping someone who understands 
assemblers might take this.]

On 31/03/17 08:10, lie Roux wrote:
> Hello,
>
> I am currently setting up a Fuseki server with the following specs in mind:
>  - everything persistent in TDB (it works)
>  - many different named graph, with the default graph being the union of
> them (it works, but without inference)
>  - very simple inferencing (works but not with named graphs)
>  - inferenced triples stored in TDB
>
>  The first question is the following: is it realistic?

Sorry, not really.

Whether it's practical at all depends on the size of your data and how 
dynamic it is.

Jena's inference is purely in memory so running over a TDB store is 
possible but doesn't give you any scalability and is slower than running 
over an in-memory copy of the same data. Plus, as you already know, it's 
not named-graphs-aware.

Assuming modest data sizes and static data then you could either:

1. Have all your base data in named graphs in TDB. Externally run a 
reasoner over the union of that data (either raw from the TDB or by 
materializing an in-memory copy of the union). Then materialize all the 
inferred triples add them as another graph in the TDB - either directly 
to the TDB before you run fuseki or via fuseki using the graph API.

2. Set up the inference as a separate in-memory dataset within Fuseki 
which reasons over the union of your TDB graphs and provides a view of 
that union + inferences but doesn't store the results back into TDB.

Option 1 means using some external process to do the inference and is no 
use if your data is dynamic. However, for static data is does mean that 
once the data has been created you can query at full speed without 
inference slow downs and you can restart fuseki without losing anything.

Option 2 avoids the need for external processing but will be slow to 
start up (the inferences aren't being persisted) and slow to query (some 
inferences will performed on-demand). Sadly it wouldn't support dynamic 
data either. You could submit changes to the in-memory inference graph 
but when it tries to store those into the base model that will fail 
because it's a union instead of an actual graph. You could submit 
changes to the TDB store but the in-memory inference won't notice those 
automatically and you would have to restart fuseki to see them.

Dave


Re: persistent inference on named graphs in Fuseki

Posted by Dave Reynolds <da...@gmail.com>.
On 02/04/17 10:04, �lie Roux wrote:
> Hello,
>
>> Jena's inference is purely in memory so running over a TDB store is
>> possible but doesn't give you any scalability and is slower than
>> running over an in-memory copy of the same data. Plus, as you already
>> know, it's not named-graphs-aware.
>
> Thank you for your clarifying answer! I really think this should be made
> clear somewhere in the documentation, as it would have saved us a few
> days of tests trying to understand why fuseki didn't behave as we
> expected...

Sorry about that!

> also maybe a few warning or error messages when fuseki reads
> the docs and sees inference on tdb or unionDefaultGraph?

It should be possible to create an inference graph over TDB with union 
default (that's my option 2) so there's no need for a warning. It's 
"just" that performance may be poor and the inference results won't be 
persisted.

Not sure why it wasn't working for you.

>> Assuming modest data sizes and static data then you could either:
>
> Well, it's fairly large (a few million triples) and very dynamic... but
> we ended up with the following solution, I explain it here in case it
> can help people with the same needs:
>
> In the code that transfers data to fuseki, we run a reasoner and add the
> inferred triples in the corresponding named graph, and then transfer all
> the triples (our data + inferred) to fuseki which stores it into TDB.
> This is doable for us because we only need limitted forward chaining
> inference that doesn't cross named graphs, but this would certainly not
> work for very large inferences such as the OWL full reasoner across the
> union graph...

Sounds like option 1, glad you have something working.

Dave


Re: persistent inference on named graphs in Fuseki

Posted by Élie Roux <el...@telecom-bretagne.eu>.
Hello,

> Jena's inference is purely in memory so running over a TDB store is
> possible but doesn't give you any scalability and is slower than
> running over an in-memory copy of the same data. Plus, as you already
> know, it's not named-graphs-aware.

Thank you for your clarifying answer! I really think this should be made
clear somewhere in the documentation, as it would have saved us a few
days of tests trying to understand why fuseki didn't behave as we
expected... also maybe a few warning or error messages when fuseki reads
the docs and sees inference on tdb or unionDefaultGraph? Because right
now it just silently misbehave, which is quite painful for users...

> Assuming modest data sizes and static data then you could either:

Well, it's fairly large (a few million triples) and very dynamic... but
we ended up with the following solution, I explain it here in case it
can help people with the same needs:

In the code that transfers data to fuseki, we run a reasoner and add the
inferred triples in the corresponding named graph, and then transfer all
the triples (our data + inferred) to fuseki which stores it into TDB.
This is doable for us because we only need limitted forward chaining
inference that doesn't cross named graphs, but this would certainly not
work for very large inferences such as the OWL full reasoner across the
union graph...

Thank you,
-- 
Elie

Re: persistent inference on named graphs in Fuseki

Posted by "A. Soroka" <aj...@virginia.edu>.
Datasets are covered very nicely in the RDF core recommendations:

https://www.w3.org/TR/rdf11-concepts/#section-dataset

---
A. Soroka
The University of Virginia Library

> On Apr 2, 2017, at 5:38 AM, Dave Reynolds <da...@gmail.com> wrote:
> 
> On 02/04/17 10:25, Laura Morales wrote:
>>>> - no inference over the whole graph, only inference on a single graph
>>> 
>>> No inference support over the whole *Dataset*.
>> 
>> "whole graph" I mean 2 or more graphs loaded into the server, that together make a larger graph. Isn't this the same thing as "dataset"? Or am I missing something?
>> 
> 
> A Dataset is a collection of graph comprising one default graph and zero or more named graphs.
> 
> The default graph in a dataset may be completely distinct from the named graphs or may contain some precomputed combination of them or (e.g. with TDB union default) you can arrange for the default graph to give the appearance of being the union of all the triples in all the named graphs. These are all choices, the notion of a dataset doesn't enforce any particular implementation for the default graph
> 
> My point is that Jena's rule-based inference engines don't know anything about datasets, just about graphs.
> 
> However, you can point an inference engine at any graph in TDB including the union graph (either by using union default and pointing to the default graph or by pointing to the pseudo named graph urn:x-arq:UnionGraph). Then you are indeed performing inference over the union of the data it's just that the inference engine doesn't know that or care.
> 
> Dave
> 


Re: persistent inference on named graphs in Fuseki

Posted by Dave Reynolds <da...@gmail.com>.
On 02/04/17 10:25, Laura Morales wrote:
>>> - no inference over the whole graph, only inference on a single graph
>>
>> No inference support over the whole *Dataset*.
>
> "whole graph" I mean 2 or more graphs loaded into the server, that together make a larger graph. Isn't this the same thing as "dataset"? Or am I missing something?
>

A Dataset is a collection of graph comprising one default graph and zero 
or more named graphs.

The default graph in a dataset may be completely distinct from the named 
graphs or may contain some precomputed combination of them or (e.g. with 
TDB union default) you can arrange for the default graph to give the 
appearance of being the union of all the triples in all the named 
graphs. These are all choices, the notion of a dataset doesn't enforce 
any particular implementation for the default graph

My point is that Jena's rule-based inference engines don't know anything 
about datasets, just about graphs.

However, you can point an inference engine at any graph in TDB including 
the union graph (either by using union default and pointing to the 
default graph or by pointing to the pseudo named graph 
urn:x-arq:UnionGraph). Then you are indeed performing inference over the 
union of the data it's just that the inference engine doesn't know that 
or care.

Dave


Re: persistent inference on named graphs in Fuseki

Posted by Laura Morales <la...@mail.com>.
>> - no inference over the whole graph, only inference on a single graph
>
> No inference support over the whole *Dataset*.

"whole graph" I mean 2 or more graphs loaded into the server, that together make a larger graph. Isn't this the same thing as "dataset"? Or am I missing something?

Re: persistent inference on named graphs in Fuseki

Posted by Dave Reynolds <da...@gmail.com>.
On 02/04/17 09:31, Laura Morales wrote:
>> There's no built in support for inference over a dataset as a
>> whole. There's no support for rules which test which graph a triple is
>> in or which assert results into difference graphs etc.
>
> - no inference over the whole graph, only inference on a single graph

No inference support over the whole *Dataset*.

> - no support to test which graph a triple is in
>
> Is this by design, or are there technical limitations to not have these, or simply nobody has implemented them yet?

The latter. The inference system was designed before Sparql and Datasets 
and named graphs existed. Over the years Datasets, rather than single 
Graphs, have been the central concept for a lot of RDF processing and if 
you were designing an inference system now you would take that into account.

Dave




Re: persistent inference on named graphs in Fuseki

Posted by Laura Morales <la...@mail.com>.
> There's no built in support for inference over a dataset as a
> whole. There's no support for rules which test which graph a triple is
> in or which assert results into difference graphs etc.

- no inference over the whole graph, only inference on a single graph
- no support to test which graph a triple is in

Is this by design, or are there technical limitations to not have these, or simply nobody has implemented them yet?

Re: persistent inference on named graphs in Fuseki

Posted by Dave Reynolds <da...@gmail.com>.
On 02/04/17 09:18, Laura Morales wrote:
>> Plus, as you already know, it's not named-graphs-aware.
>
> what does this mean?
>

The rule-based inference engines only know about Graphs/Models, not 
Datasets. There's no built in support for inference over a dataset as a 
whole. There's no support for rules which test which graph a triple is 
in or which assert results into difference graphs etc.

Dave


Re: persistent inference on named graphs in Fuseki

Posted by Laura Morales <la...@mail.com>.
> Plus, as you already know, it's not named-graphs-aware.

what does this mean?