You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Benjamin Geer <be...@dasch.swiss> on 2020/04/03 14:38:43 UTC

questions about reasoning with TDB

Hello,

I’ve been reading the documentation and list archives about Fuseki assembler configurations with TDB and reasoners, and I’m trying to figure out whether the setup I’d like to use is possible. I have three questions:

1. I’d like to use a forward-chaining reasoner to improve query performance with a large TDB dataset by inferring some frequently queried relations. To avoid having to recompute all the inferred triples every time Fuseki is started (which could take a long time), I’d like to persist the inferred triples in TDB as well. Is that possible? I looked for this scenario in the Jena documentation but didn’t find it.

2. For queries, I’d like a default graph containing the union of all named graphs plus the inferred statements. Can this be done along with (1)?

3. The named graphs in the base model need to be continually updated (always using SPARQL quad patterns), and I’d like the reasoner to update its inferences when that happens. After reading some old messages on this list, I think this might not be possible, because if I understand correctly, the only way to update the base model would be via a separate Fuseki service that updates the underlying TDB dataset directly, and in that case, the reasoner won’t see those updates until Fuseki is restarted. Did I understand that correctly, and if so, is it still true?

http://mail-archives.apache.org/mod_mbox/jena-users/201303.mbox/%3c514AEBAF.2020906@apache.org%3e <http://mail-archives.apache.org/mod_mbox/jena-users/201303.mbox/%3C514AEBAF.2020906@apache.org%3E>

http://mail-archives.apache.org/mod_mbox/jena-users/201603.mbox/%3c56EC23E3.9010000@apache.org%3e <http://mail-archives.apache.org/mod_mbox/jena-users/201603.mbox/%3C56EC23E3.9010000@apache.org%3E>

Ben

Re: questions about reasoning with TDB

Posted by Benjamin Geer <be...@dasch.swiss>.

> On 7 Apr 2020, at 22:29, Andy Seaborne <an...@apache.org> wrote:
> 
> Benjamin,
> 
> What complexity of reasoning are you doing?

Hi Andy,

Thanks for asking, and sorry I didn’t see your reply sooner.

Basically I would just like a subset of RDFS along with owl:TransitiveProperty. I was going to try this custom rule set:

# OWL axioms
[owlClass: (?a rdf:type owl:Class) -> (?a rdf:type rdfs:Class)]
[owlObjectProp: (?a rdf:type owl:ObjectProperty) -> (?a rdf:type rdf:Property)]
[owlDatatypeProp: (?a rdf:type owl:DatatypeProperty) -> (?a rdf:type rdf:Property)]
[owlAnnotationProp: (?a rdf:type owl:AnnotationProperty) -> (?a rdf:type rdf:Property)]

# rdfs:subPropertyOf
[rdfs5: (?a rdfs:subPropertyOf ?b), (?b rdfs:subPropertyOf ?c) -> (?a rdfs:subPropertyOf ?c)]
[rdfs6: (?a rdf:type rdf:Property) -> (?a rdfs:subPropertyOf ?a)]
[rdfs7: (?x ?a ?y), (?a rdfs:subPropertyOf ?b) -> (?x  ?b  ?y)]

# rdfs:subClassOf
[rdfs9: (?a rdf:type ?x), (?x rdfs:subClassOf ?y) -> (?a rdf:type ?y)]
[rdfs10: (?x rdf:type rdfs:Class) -> (?x rdfs:subClassOf ?x)]
[rdfs11: (?x rdfs:subClassOf ?y), (?y rdfs:subClassOf ?z) -> (?x rdfs:subClassOf ?z)]

# owl:TransitiveProperty
[prp_trp: (?p rdf:type owl:TransitiveProperty), (?x ?p ?y), (?y ?p ?z) -> (?x ?p ?z)]

Currently I’m using SPARQL property path syntax instead of inference, like this:

?resource rdf:type ?resourceType .
?resourceType rdfs:subClassOf* example:SomeClass .

Better performance would be the only reason for me to use inference. I’ve been assuming (perhaps incorrectly?) that backward chaining inference wouldn't be more efficient than just using property path syntax in the query. Is that a reasonable assumption?

> But RDFS+ level of complixity could take a different approach to the current rules. Essentially, be backwards chaining (sees updates) with materialized transitive properties.

Could you explain what you mean by materialized transitive properties?

> It's might be possible to go a fit further that that. Rules that generate a single triple from a BGP+FILTER(+BIND), together with transitive properties (not written as rules) might be possible.

Do you mean using the TransitiveReasoner to handle rdfs:subPropertyOf and rdfs:subClassOf, and the GenericRuleReasoner for anything else?

Ben

Re: questions about reasoning with TDB

Posted by Andy Seaborne <an...@apache.org>.

Inline:

On 04/04/2020 12:23, Dave Reynolds wrote:
> Hi,
> 
> On 03/04/2020 15:38, Benjamin Geer wrote:
>> I’ve been reading the documentation and list archives about Fuseki 
>> assembler configurations with TDB and reasoners, and I’m trying to 
>> figure out whether the setup I’d like to use is possible. I have three 
>> questions:
>>
>> 1. I’d like to use a forward-chaining reasoner to improve query 
>> performance with a large TDB dataset by inferring some frequently 
>> queried relations. To avoid having to recompute all the inferred 
>> triples every time Fuseki is started (which could take a long time), 
>> I’d like to persist the inferred triples in TDB as well. Is that 
>> possible? I looked for this scenario in the Jena documentation but 
>> didn’t find it.
> 
> Basically this isn't supported, sorry.

Benjamin,

What complexity of reasoning are you doing?

There is a tradeoff of complexity/performance at scale/effort needed.
And "effort needed" for complex+scale can be huge.

But RDFS+ level of complixity could take a different approach to the 
current rules. Essentially, be backwards chaining (sees updates) with 
materialized transitive properties.

It's might be possible to go a fit further that that. Rules that 
generate a single triple from a BGP+FILTER(+BIND), together with 
transitive properties (not written as rules) might be possible.

> The forward chaining engine keeps a *lot* of state in memory in the 
> RETE-like network. Which means unless you have very selective patterns 
> in your rules you can end up with large parts of the data in memory. In 
> worst cases you can have multiple copies.
> 
> This has several implications:
> 
> First, it means that it's not scalable. If you have a very large TDB 
> dataset then the reasoner is likely to run out memory. Plus the internal 
> format is really not optimised for large scale data and inference speed 
> will take a hit.
> 
> Second, it means that there's no point persisting the inference results 
> on their own, unless they are static. If, as in your case, you want to 
> continue to add new data and get incremental inferencing then you would 
> need some way to preserve and restore the intermediate state in the 
> engine, which is not supported.
> 
> So given this there's little point in supporting having the deductions 
> graph in TDB because that doesn't solve the problems of scaling and 
> restart.
> 
>> 2. For queries, I’d like a default graph containing the union of all 
>> named graphs plus the inferred statements. Can this be done along with 
>> (1)?
> 
> The first part can be done manually but not along with (1).
> 
> It's possible to use some offline process to generate a static set of 
> inferences (whether using the rule engine or e.g. SPARQL construct 
> queries) to one named graph, put the data in another graph and then have 
> the default graph be the union.
> 
> However, your data isn't static so this doesn't help.
> 
>> 3. The named graphs in the base model need to be continually updated 
>> (always using SPARQL quad patterns), and I’d like the reasoner to 
>> update its inferences when that happens. After reading some old 
>> messages on this list, I think this might not be possible, because if 
>> I understand correctly, the only way to update the base model would be 
>> via a separate Fuseki service that updates the underlying TDB dataset 
>> directly, and in that case, the reasoner won’t see those updates until 
>> Fuseki is restarted. Did I understand that correctly, and if so, is it 
>> still true?
> 
> I thought you could configure fuseki to have a reasoner as the source 
> model and so have updates do to the reasoner rather than a base graph. 
> However, given none of the rest of what you need to do is supported this 
> point is moot.

Yes,but not for the union default graph. That actually only exists for 
query (and WHERE in SPARQL Update). It isn't updatable. If you update 
the named graphs it sees the change but that's bypassed the reasoner on 
the graph.

> 
> Sorry to not be able to support your use case.
> 
> Dave

Re: questions about reasoning with TDB

Posted by Benjamin Geer <be...@dasch.swiss>.

> On 4 Apr 2020, at 13:23, Dave Reynolds <da...@gmail.com> wrote:

> Basically this isn't supported, sorry.


Thank you for this very helpful explanation. I guess I’ll make do without reasoning.

Ben

Re: questions about reasoning with TDB

Posted by Dave Reynolds <da...@gmail.com>.

Hi,

On 03/04/2020 15:38, Benjamin Geer wrote:
> I’ve been reading the documentation and list archives about Fuseki assembler configurations with TDB and reasoners, and I’m trying to figure out whether the setup I’d like to use is possible. I have three questions:
> 
> 1. I’d like to use a forward-chaining reasoner to improve query performance with a large TDB dataset by inferring some frequently queried relations. To avoid having to recompute all the inferred triples every time Fuseki is started (which could take a long time), I’d like to persist the inferred triples in TDB as well. Is that possible? I looked for this scenario in the Jena documentation but didn’t find it.

Basically this isn't supported, sorry.

The forward chaining engine keeps a *lot* of state in memory in the 
RETE-like network. Which means unless you have very selective patterns 
in your rules you can end up with large parts of the data in memory. In 
worst cases you can have multiple copies.

This has several implications:

First, it means that it's not scalable. If you have a very large TDB 
dataset then the reasoner is likely to run out memory. Plus the internal 
format is really not optimised for large scale data and inference speed 
will take a hit.

Second, it means that there's no point persisting the inference results 
on their own, unless they are static. If, as in your case, you want to 
continue to add new data and get incremental inferencing then you would 
need some way to preserve and restore the intermediate state in the 
engine, which is not supported.

So given this there's little point in supporting having the deductions 
graph in TDB because that doesn't solve the problems of scaling and restart.

> 2. For queries, I’d like a default graph containing the union of all named graphs plus the inferred statements. Can this be done along with (1)?

The first part can be done manually but not along with (1).

It's possible to use some offline process to generate a static set of 
inferences (whether using the rule engine or e.g. SPARQL construct 
queries) to one named graph, put the data in another graph and then have 
the default graph be the union.

However, your data isn't static so this doesn't help.

> 3. The named graphs in the base model need to be continually updated (always using SPARQL quad patterns), and I’d like the reasoner to update its inferences when that happens. After reading some old messages on this list, I think this might not be possible, because if I understand correctly, the only way to update the base model would be via a separate Fuseki service that updates the underlying TDB dataset directly, and in that case, the reasoner won’t see those updates until Fuseki is restarted. Did I understand that correctly, and if so, is it still true?

I thought you could configure fuseki to have a reasoner as the source 
model and so have updates do to the reasoner rather than a base graph. 
However, given none of the rest of what you need to do is supported this 
point is moot.

Sorry to not be able to support your use case.

Dave