You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Marcus Cobden <li...@marcuscobden.co.uk> on 2012/01/16 12:42:07 UTC
MultiUnion & Slow Reasoning
I've loaded a TRIG graph using ng4j, but I am finding that reasoning over it is particularly slow.
After converting the same graph to N-Triples, and working with only jena models the reasoning is a lot faster.
Underneath, ng4j is using a MultiUnion graph to combine the named graphs, would this be causing some slowness?
I think there might be a case of double-reification going on: MultiUnion overrides graphBaseFind() to call find() on the wrapped graphs, but MultiUnion.find() as inherited still checks its reifier.
Is this a genuine source of slowness? Or should I be looking somewhere else?
Regards,
Marcus
Re: MultiUnion & Slow Reasoning
Posted by Dave Reynolds <da...@gmail.com>.
On 22/01/12 20:41, Marcus Cobden wrote:
> On 16/01/2012 21:05, Dave Reynolds wrote:
>> On 16/01/12 11:42, Marcus Cobden wrote:
>>> I've loaded a TRIG graph using ng4j, but I am finding that reasoning
>>> over it is particularly slow.
>>> After converting the same graph to N-Triples, and working with only jena
>>> models the reasoning is a lot faster.
>>>
>>> Underneath, ng4j is using a MultiUnion graph to combine the named
>>> graphs, would this be causing some slowness?
>>
>> Possibly. A reasoner will ask a lot of find operations and for
>> MultiUnion each find will be distributed to each graph which does
>> entail some overhead. If there is any redundancy between the graphs
>> then there could be duplicated traversals.
>>
>> You could test if it is MultiUnion or some other aspect of ng4j by
>> converting the data to an OntModel instead with addSubModel to add
>> each graph.
>
> It looks like it's not something ng4j specific:
>
> Pre-flattened n-triples:
> ~17.14s
> ng4j:
> ~395.08s
> ont-model:
> ~313.46s
>
> I'm running over the BSBM dataset, split into graphs.
How many graphs?
Are the graphs reasonably disjoint or do they contain redundant copies
of assertions?
> There are 29355
> triples before inference, and 40359 after.
>
> This is roughly what my code is doing:
>
> OntModel om = ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM);
>
> // Add a bunch of submodels read from the filesystem.
> //Model m = ModelFactory.createDefaultModel();
> //om.addSubModel(m);
>
> Assert.assertEquals(29355, om.size());
>
> Resource config = ModelFactory.createDefaultModel()
> .createResource()
> .addProperty(ReasonerVocabulary.PROPsetRDFSLevel,
> RDFSRuleReasoner.FULL_RULES);
> Reasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(config);
> reasoner.setDerivationLogging(true);
>
> InfModel infm = ModelFactory.createInfModel(reasoner, om);
>
> Assert.assertEquals(40359, infm.size());
All looks reasonable.
> Do you have any other suggestions?
Not really I'm afraid.
The reasoner is just doing find calls on the underlying model. It sounds
like the overheads of MultiUnion routing the find to each submodel and
then running a uniqueness filter over the concatenated results is
costing you 20x on performance. Very surprising unless there's a LOT of
redundancy between the graphs!
Sounds like some profiling of MultiUnion might be useful if anyone has
spare capacity to look at that.
In the meantime it seems like you should create a merge graph to do the
inference over, even if you do all the other work using the separated
graphs.
Dave
Re: MultiUnion & Slow Reasoning
Posted by Marcus Cobden <li...@marcuscobden.co.uk>.
On 16/01/2012 21:05, Dave Reynolds wrote:
> On 16/01/12 11:42, Marcus Cobden wrote:
>> I've loaded a TRIG graph using ng4j, but I am finding that reasoning
>> over it is particularly slow.
>> After converting the same graph to N-Triples, and working with only jena
>> models the reasoning is a lot faster.
>>
>> Underneath, ng4j is using a MultiUnion graph to combine the named
>> graphs, would this be causing some slowness?
>
> Possibly. A reasoner will ask a lot of find operations and for MultiUnion each find will be distributed to each graph which does entail some overhead. If there is any redundancy between the graphs then there could be duplicated traversals.
>
> You could test if it is MultiUnion or some other aspect of ng4j by converting the data to an OntModel instead with addSubModel to add each graph.
It looks like it's not something ng4j specific:
Pre-flattened n-triples:
~17.14s
ng4j:
~395.08s
ont-model:
~313.46s
I'm running over the BSBM dataset, split into graphs. There are 29355 triples before inference, and 40359 after.
This is roughly what my code is doing:
OntModel om = ModelFactory.createOntologyModel(OntModelSpec.RDFS_MEM);
// Add a bunch of submodels read from the filesystem.
//Model m = ModelFactory.createDefaultModel();
//om.addSubModel(m);
Assert.assertEquals(29355, om.size());
Resource config = ModelFactory.createDefaultModel()
.createResource()
.addProperty(ReasonerVocabulary.PROPsetRDFSLevel, RDFSRuleReasoner.FULL_RULES);
Reasoner reasoner = RDFSRuleReasonerFactory.theInstance().create(config);
reasoner.setDerivationLogging(true);
InfModel infm = ModelFactory.createInfModel(reasoner, om);
Assert.assertEquals(40359, infm.size());
Do you have any other suggestions?
Re: MultiUnion & Slow Reasoning
Posted by Dave Reynolds <da...@gmail.com>.
On 16/01/12 11:42, Marcus Cobden wrote:
> I've loaded a TRIG graph using ng4j, but I am finding that reasoning
> over it is particularly slow.
> After converting the same graph to N-Triples, and working with only jena
> models the reasoning is a lot faster.
>
> Underneath, ng4j is using a MultiUnion graph to combine the named
> graphs, would this be causing some slowness?
Possibly. A reasoner will ask a lot of find operations and for
MultiUnion each find will be distributed to each graph which does entail
some overhead. If there is any redundancy between the graphs then there
could be duplicated traversals.
You could test if it is MultiUnion or some other aspect of ng4j by
converting the data to an OntModel instead with addSubModel to add each
graph.
> I think there might be a case of double-reification going on: MultiUnion
> overrides graphBaseFind() to call find() on the wrapped graphs, but
> MultiUnion.find() as inherited still checks its reifier.
>
> Is this a genuine source of slowness? Or should I be looking somewhere
> else?
Possibly, I would defer to Chris on reifier checking cost. However, my
*guess* is that in the absence of any actual reification would be that
the checks are low cost. Is there any reification?
Presumably you could create a flattened merge graph in code for
reasoning purposes.
Dave