You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Marcus Cobden <li...@marcuscobden.co.uk> on 2012/01/16 12:42:07 UTC

MultiUnion & Slow Reasoning

I've loaded a TRIG graph using ng4j, but I am finding that reasoning over it is particularly slow.
After converting the same graph to N-Triples, and working with only jena models the reasoning is a lot faster.

Underneath, ng4j is using a MultiUnion graph to combine the named graphs, would this be causing some slowness?

I think there might be a case of double-reification going on: MultiUnion overrides graphBaseFind() to call find() on the wrapped graphs, but MultiUnion.find() as inherited still checks its reifier.

Is this a genuine source of slowness? Or should I be looking somewhere else?

Regards,
Marcus

Re: MultiUnion & Slow Reasoning

Posted by Dave Reynolds <da...@gmail.com>.
On 22/01/12 20:41, Marcus Cobden wrote:
> On 16/01/2012 21:05, Dave Reynolds wrote:
>> On 16/01/12 11:42, Marcus Cobden wrote:
>>> I've loaded a TRIG graph using ng4j, but I am finding that reasoning
>>> over it is particularly slow.
>>> After converting the same graph to N-Triples, and working with only jena
>>> models the reasoning is a lot faster.
>>>
>>> Underneath, ng4j is using a MultiUnion graph to combine the named
>>> graphs, would this be causing some slowness?
>>
>> Possibly. A reasoner will ask a lot of find operations and for
>> MultiUnion each find will be distributed to each graph which does
>> entail some overhead. If there is any redundancy between the graphs
>> then there could be duplicated traversals.
>>
>> You could test if it is MultiUnion or some other aspect of ng4j by
>> converting the data to an OntModel instead with addSubModel to add
>> each graph.
>
> It looks like it's not something ng4j specific:
>
> Pre-flattened n-triples:
> ~17.14s
> ng4j:
> ~395.08s
> ont-model:
> ~313.46s
>
> I'm running over the BSBM dataset, split into graphs.

How many graphs?

Are the graphs reasonably disjoint or do they contain redundant copies 
of assertions?

> There are 29355
> triples before inference, and 40359 after.
>
> This is roughly what my code is doing:
>
> OntModel om = ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM);
>
> // Add a bunch of submodels read from the filesystem.
> //Model m = ModelFactory.createDefaultModel();
> //om.addSubModel(m);
>
> Assert.assertEquals(29355, om.size());
>
> Resource config = ModelFactory.createDefaultModel()
> .createResource()
> .addProperty(ReasonerVocabulary.PROPsetRDFSLevel,
> RDFSRuleReasoner.FULL_RULES);
> Reasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(config);
> reasoner.setDerivationLogging(true);
>
> InfModel infm = ModelFactory.createInfModel(reasoner, om);
>
> Assert.assertEquals(40359, infm.size());

All looks reasonable.

> Do you have any other suggestions?

Not really I'm afraid.

The reasoner is just doing find calls on the underlying model. It sounds 
like the overheads of MultiUnion routing the find to each submodel and 
then running a uniqueness filter over the concatenated results is 
costing you 20x on performance. Very surprising unless there's a LOT of 
redundancy between the graphs!

Sounds like some profiling of MultiUnion might be useful if anyone has 
spare capacity to look at that.

In the meantime it seems like you should create a merge graph to do the 
inference over, even if you do all the other work using the separated 
graphs.

Dave


Re: MultiUnion & Slow Reasoning

Posted by Marcus Cobden <li...@marcuscobden.co.uk>.
On 16/01/2012 21:05, Dave Reynolds wrote:
> On 16/01/12 11:42, Marcus Cobden wrote:
>> I've loaded a TRIG graph using ng4j, but I am finding that reasoning
>> over it is particularly slow.
>> After converting the same graph to N-Triples, and working with only jena
>> models the reasoning is a lot faster.
>>
>> Underneath, ng4j is using a MultiUnion graph to combine the named
>> graphs, would this be causing some slowness?
>
> Possibly. A reasoner will ask a lot of find operations and for MultiUnion each find will be distributed to each graph which does entail some overhead. If there is any redundancy between the graphs then there could be duplicated traversals.
>
> You could test if it is MultiUnion or some other aspect of ng4j by converting the data to an OntModel instead with addSubModel to add each graph.

It looks like it's not something ng4j specific:

Pre-flattened n-triples:
	~17.14s
ng4j:
	~395.08s
ont-model:
	~313.46s

I'm running over the BSBM dataset, split into graphs. There are 29355 triples before inference, and 40359 after.

This is roughly what my code is doing:

	OntModel om = ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM);
	
	// Add a bunch of submodels read from the filesystem.
	//Model m = ModelFactory.createDefaultModel();
	//om.addSubModel(m);

	Assert.assertEquals(29355, om.size());
	
	Resource config = ModelFactory.createDefaultModel()
			.createResource()
			.addProperty(ReasonerVocabulary.PROPsetRDFSLevel, RDFSRuleReasoner.FULL_RULES);
	Reasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(config);
	reasoner.setDerivationLogging(true);
	
	InfModel infm = ModelFactory.createInfModel(reasoner, om);
	
	Assert.assertEquals(40359, infm.size());

Do you have any other suggestions?

Re: MultiUnion & Slow Reasoning

Posted by Dave Reynolds <da...@gmail.com>.
On 16/01/12 11:42, Marcus Cobden wrote:
> I've loaded a TRIG graph using ng4j, but I am finding that reasoning
> over it is particularly slow.
> After converting the same graph to N-Triples, and working with only jena
> models the reasoning is a lot faster.
>
> Underneath, ng4j is using a MultiUnion graph to combine the named
> graphs, would this be causing some slowness?

Possibly. A reasoner will ask a lot of find operations and for 
MultiUnion each find will be distributed to each graph which does entail 
some overhead. If there is any redundancy between the graphs then there 
could be duplicated traversals.

You could test if it is MultiUnion or some other aspect of ng4j by 
converting the data to an OntModel instead with addSubModel to add each 
graph.

> I think there might be a case of double-reification going on: MultiUnion
> overrides graphBaseFind() to call find() on the wrapped graphs, but
> MultiUnion.find() as inherited still checks its reifier.
>
> Is this a genuine source of slowness? Or should I be looking somewhere
> else?

Possibly, I would defer to Chris on reifier checking cost. However, my 
*guess* is that in the absence of any actual reification would be that 
the checks are low cost. Is there any reification?

Presumably you could create a flattened merge graph in code for 
reasoning purposes.

Dave