You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by David Jordan <da...@bellsouth.net> on 2013/08/23 02:03:06 UTC

inferencing a small subset of a graph

The default Jena reasoner performs inference on an entire graph. For a very large graph, this inferencing can be fairly expensive. I got asked today whether there is any way to just do inferencing on a small subset of a very large graph.

I am wondering whether it would be feasible and make sense to create a new in-memory graph and then essentially make a copy of the relevant triples from the very large graph into this in-memory graph, and then perform inferencing just on that small graph. The purpose is to answer a query or question on a small subset of the graph without incurring the overhead of doing it for the entire graph.

Is this a common practice? Best practice? Are there any recommended ways to efficiently implement the copy process from the stored graph into the in-memory graph?

If I could get a response by noon EST Friday, that would be great, as I have a presentation at 1 that this may come up.



Re: inferencing a small subset of a graph

Posted by Arthur Vaïsse-Lesteven <ar...@yahoo.fr>.

If you don't have to use a reasoner you can easily make the copy of the data and the inference work with SPARQL queries.

An INSERT WHERE query could allow you to copy some type of data to a target graph (it exists a COPY query too, if you want to copy an entire graph directly), and you can run inference only on this targeted graph by the use of queries like this one :

#rdfs inference of symmetric triples.

INSERT { graph <inference graph> {?o ?p ?s} }

WHERE{
    graph <targeted graph> { ?s ?p ?o}.
    ?p rdf:type SymetricPropery
}


The inference graph at the end will contain all the inferred triples.

Hope this could help you ! 

VAÏSSE-LESTEVEN Arthur.



________________________________
 De : David Jordan <da...@bellsouth.net>
À : users@jena.apache.org 
Envoyé le : Vendredi 23 août 2013 2h03
Objet : inferencing a small subset of a graph
 


The default Jena reasoner performs inference on an entire graph. For a very large graph, this inferencing can be fairly expensive. I got asked today whether there is any way to just do inferencing on a small subset of a very large graph.

I am wondering whether it would be feasible and make sense to create a new in-memory graph and then essentially make a copy of the relevant triples from the very large graph into this in-memory graph, and then perform inferencing just on that small graph. The purpose is to answer a query or question on a small subset of the graph without incurring the overhead of doing it for the entire graph.

Is this a common practice? Best practice? Are there any recommended ways to efficiently implement the copy process from the stored graph into the in-memory graph?

If I could get a response by noon EST Friday, that would be great, as I have a presentation at 1 that this may come up.

Re: inferencing a small subset of a graph

Posted by Dave Reynolds <da...@gmail.com>.
On 23/08/13 01:03, David Jordan wrote:
>
> The default Jena reasoner performs inference on an entire graph. For a very large graph, this inferencing can be fairly expensive. I got asked today whether there is any way to just do inferencing on a small subset of a very large graph.
>
> I am wondering whether it would be feasible and make sense to create a new in-memory graph and then essentially make a copy of the relevant triples from the very large graph into this in-memory graph, and then perform inferencing just on that small graph. The purpose is to answer a query or question on a small subset of the graph without incurring the overhead of doing it for the entire graph.

It is certainly feasible.

For example, it is not that uncommon to take a graph containing the 
description of a few resources, compute the inference closure of that 
against the full ontologies, and then either query over that closure or 
add the closure into a larger model for later query.

This is one way to get incremental additions with inference to a large 
model.

*However*, this can lead to incomplete inferences so whether the 
approach is viable for any given situation depends on the data, the 
ontology and the queries you need to be able to answer.

For example, simple RDFS inference can be handled this way - so long as 
all RDFS assertions (class hierarchy, property definitions and 
hierarchy) are in the ontology you reason against. For example, if in 
your fragment graph you have:

    :subject a :MyClass; :property :myvalue .

and in your ontology you have:

    :MyClass rdfs:subClassOf :Super .
    :property rdfs:range :R; rdfs:domain :D; rdfs:subPropertyOf :superp .

then you can make the local inferences quite happily, independent of 
what might be in the rest of the data:

    :subject a :MyClass, :Super, :D;
       :property :myvalue;
       :superp :myvalue .
    :myvalue a :R .

However, for OWL you can have longer range effects for example 
transitive properties and chain axioms. These can't be computed on local 
extracts. For example, if :p is a transitive property and your main data 
has:

   :a :p :b .

   :c :p :d .

Then you go to add a fragment:

   :b :p :c .

Then if you can see the whole graph you could infer:

   :a :p :c .
   :a :p :d .
   :b :p :d .

But you can't infer any of these just from the fragment and the ontology.

All of RDFS and OWL are monotonic, which means that the inferences you 
can make over a fragment are always correct, they may just be incomplete.

If by "inference" you include non-monotonic rules or a closed world 
assumption then you can't take the fragment approach at all. You get 
incorrect inferences, not just incomplete ones.

> Is this a common practice? Best practice?

I would describe it as "there are circumstances where this can be useful 
and appropriate". Not best practice in general for the reasons outlined.

> Are there any recommended ways to efficiently implement the copy process from the stored graph into the in-memory graph?

Depends on what defines a fragment for you and how you are accessing 
your data.

If a fragment is a description of a few named resources, and you are 
accessing your data as a local model then you can use the Closure 
utility to extract the bNode closure of their description. If you have 
the same fragment definition but are accessing a remote endpoint then 
use SPARQL DESCRIBE.

If your graph is split into small connected components and you want to 
extract a complete connected component then see 
ResourceUtils.reachableClosure - but it is pretty rare for an RDF graph 
to be of the right shape for that.

Dave