You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Benson Margulies <bi...@gmail.com> on 2010/12/31 14:57:12 UTC

Running out of memory in RDFS inference

Step 1:

 Model schema = ModelFactory.createDefaultModel();
        schema.read(RdfUtils.getJugOntology(),
RdfUtils.getJugOntologyUri(), "RDF/XML");
        return ModelFactory.createRDFSModel(schema, data);

Step 2: about 50k tuples, many of them owl:sameAs

Step 3:

NodeIterator sameAsItems = model.listObjectsOfProperty(root,
relatingProp); // prop is in fact owl:sameAs
            while (sameAsItems.hasNext()) {
            ...
            }

Runs for a very long time, using a very large amount of memory.
Eventually runs out of memory.

I am at this point about to remove all dependencies on inference and
use a plain model. Is there something I should learn from this about
what inference is and isn't useful for?

Re: Running out of memory in RDFS inference

Posted by Benson Margulies <bi...@gmail.com>.
Dave,

I will set up a complete repro and push it to github.

--benson


On Sat, Jan 1, 2011 at 5:58 AM, Dave Reynolds <da...@gmail.com> wrote:
> On Fri, 2010-12-31 at 11:38 -0500, Benson Margulies wrote:
>> On Fri, Dec 31, 2010 at 11:27 AM, Dave Reynolds
>> <da...@gmail.com> wrote:
>> > On Fri, 2010-12-31 at 08:57 -0500, Benson Margulies wrote:
>> >> Step 1:
>> >>
>> >>  Model schema = ModelFactory.createDefaultModel();
>> >>         schema.read(RdfUtils.getJugOntology(),
>> >> RdfUtils.getJugOntologyUri(), "RDF/XML");
>> >>         return ModelFactory.createRDFSModel(schema, data);
>> >
>> > What's in the data?
>>
>> typical item:
>>
>> <uri:jug:0618936a7a03bf236a291bcddbfde63b#e11>
>>       <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>                     rex:Person ;
>>       rex:hasEntityDetectionSource
>>                     "statistical" ;
>>       rex:hasNormalizedText
>>                     "Obama" ;
>>       rex:hasOriginalText
>>                     "Obama" ;
>>       rex:root      "true" ;
>>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e43> ;
>>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e25> ;
>>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e8> ;
>>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e104> ;
>>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e4> ;
>>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e18> ;
>>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e56> ;
>>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e115> ;
>>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e100> .
>>
>>
>> >
>> >> Step 2: about 50k tuples, many of them owl:sameAs
>> >
>> > What is 50k tuples, the data, the schema, both, something else?
>>
>> data. Schema is tiny.
>
> Can you show us the schema?
>
>> >
>> >> Step 3:
>> >>
>> >> NodeIterator sameAsItems = model.listObjectsOfProperty(root,
>> >> relatingProp); // prop is in fact owl:sameAs
>> >>             while (sameAsItems.hasNext()) {
>> >>             ...
>> >>             }
>> >>
>> >> Runs for a very long time, using a very large amount of memory.
>> >> Eventually runs out of memory.
>> >
>> > Strange.  The owl:sameAs reasoning can be hugely expensive (it is
>> > fundamentally exponential) but the RDFS reasoner knows nothing about
>> > owl:sameAs so isn't doing any of that reasoning.
>>
>> Interrupting it in Eclipse, it is definitely deep in the reasoner all
>> the time until it runs out of memory and dies.
>
> Is there definitely not an outer loop running?
>
> I can imagine a space leak so repeated calls to the reasoner will use up
> memory but find it hard to see how RDFS reasoning with a tiny schema
> could blow up so badly.
>
> Do you have a complete minimal example we could take a look at?
>
> [I realize you've switched approach but I'd like to understand why RDFS
> reasoning might blow up in this case.]
>
> Dave
>
>
>
>

Re: Running out of memory in RDFS inference

Posted by Benson Margulies <bi...@gmail.com>.
I tried to recreate the state where it was chewing up memory, and I failed.

The obvious test case to simulate it did not work.

Re: Running out of memory in RDFS inference

Posted by Dave Reynolds <da...@gmail.com>.
On Fri, 2010-12-31 at 11:38 -0500, Benson Margulies wrote: 
> On Fri, Dec 31, 2010 at 11:27 AM, Dave Reynolds
> <da...@gmail.com> wrote:
> > On Fri, 2010-12-31 at 08:57 -0500, Benson Margulies wrote:
> >> Step 1:
> >>
> >>  Model schema = ModelFactory.createDefaultModel();
> >>         schema.read(RdfUtils.getJugOntology(),
> >> RdfUtils.getJugOntologyUri(), "RDF/XML");
> >>         return ModelFactory.createRDFSModel(schema, data);
> >
> > What's in the data?
> 
> typical item:
> 
> <uri:jug:0618936a7a03bf236a291bcddbfde63b#e11>
>       <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>                     rex:Person ;
>       rex:hasEntityDetectionSource
>                     "statistical" ;
>       rex:hasNormalizedText
>                     "Obama" ;
>       rex:hasOriginalText
>                     "Obama" ;
>       rex:root      "true" ;
>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e43> ;
>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e25> ;
>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e8> ;
>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e104> ;
>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e4> ;
>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e18> ;
>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e56> ;
>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e115> ;
>       owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e100> .
> 
> 
> >
> >> Step 2: about 50k tuples, many of them owl:sameAs
> >
> > What is 50k tuples, the data, the schema, both, something else?
> 
> data. Schema is tiny.

Can you show us the schema? 

> >
> >> Step 3:
> >>
> >> NodeIterator sameAsItems = model.listObjectsOfProperty(root,
> >> relatingProp); // prop is in fact owl:sameAs
> >>             while (sameAsItems.hasNext()) {
> >>             ...
> >>             }
> >>
> >> Runs for a very long time, using a very large amount of memory.
> >> Eventually runs out of memory.
> >
> > Strange.  The owl:sameAs reasoning can be hugely expensive (it is
> > fundamentally exponential) but the RDFS reasoner knows nothing about
> > owl:sameAs so isn't doing any of that reasoning.
> 
> Interrupting it in Eclipse, it is definitely deep in the reasoner all
> the time until it runs out of memory and dies.

Is there definitely not an outer loop running?

I can imagine a space leak so repeated calls to the reasoner will use up
memory but find it hard to see how RDFS reasoning with a tiny schema
could blow up so badly.

Do you have a complete minimal example we could take a look at?

[I realize you've switched approach but I'd like to understand why RDFS
reasoning might blow up in this case.]

Dave




Re: Running out of memory in RDFS inference

Posted by Benson Margulies <bi...@gmail.com>.
On Fri, Dec 31, 2010 at 11:27 AM, Dave Reynolds
<da...@gmail.com> wrote:
> On Fri, 2010-12-31 at 08:57 -0500, Benson Margulies wrote:
>> Step 1:
>>
>>  Model schema = ModelFactory.createDefaultModel();
>>         schema.read(RdfUtils.getJugOntology(),
>> RdfUtils.getJugOntologyUri(), "RDF/XML");
>>         return ModelFactory.createRDFSModel(schema, data);
>
> What's in the data?

typical item:

<uri:jug:0618936a7a03bf236a291bcddbfde63b#e11>
      <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
                    rex:Person ;
      rex:hasEntityDetectionSource
                    "statistical" ;
      rex:hasNormalizedText
                    "Obama" ;
      rex:hasOriginalText
                    "Obama" ;
      rex:root      "true" ;
      owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e43> ;
      owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e25> ;
      owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e8> ;
      owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e104> ;
      owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e4> ;
      owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e18> ;
      owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e56> ;
      owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e115> ;
      owl:sameAs    <uri:jug:0618936a7a03bf236a291bcddbfde63b#e100> .


>
>> Step 2: about 50k tuples, many of them owl:sameAs
>
> What is 50k tuples, the data, the schema, both, something else?

data. Schema is tiny.

>
>> Step 3:
>>
>> NodeIterator sameAsItems = model.listObjectsOfProperty(root,
>> relatingProp); // prop is in fact owl:sameAs
>>             while (sameAsItems.hasNext()) {
>>             ...
>>             }
>>
>> Runs for a very long time, using a very large amount of memory.
>> Eventually runs out of memory.
>
> Strange.  The owl:sameAs reasoning can be hugely expensive (it is
> fundamentally exponential) but the RDFS reasoner knows nothing about
> owl:sameAs so isn't doing any of that reasoning.

Interrupting it in Eclipse, it is definitely deep in the reasoner all
the time until it runs out of memory and dies.

I've removed all use reasoning.

To forstall, I've also concluded that the data model I've got with
items in RDF for all the entity refs is not viable, and I'll be
changing it.

Re: Running out of memory in RDFS inference

Posted by Dave Reynolds <da...@gmail.com>.
On Fri, 2010-12-31 at 08:57 -0500, Benson Margulies wrote: 
> Step 1:
> 
>  Model schema = ModelFactory.createDefaultModel();
>         schema.read(RdfUtils.getJugOntology(),
> RdfUtils.getJugOntologyUri(), "RDF/XML");
>         return ModelFactory.createRDFSModel(schema, data);

What's in the data? 

> Step 2: about 50k tuples, many of them owl:sameAs

What is 50k tuples, the data, the schema, both, something else?

> Step 3:
> 
> NodeIterator sameAsItems = model.listObjectsOfProperty(root,
> relatingProp); // prop is in fact owl:sameAs
>             while (sameAsItems.hasNext()) {
>             ...
>             }
> 
> Runs for a very long time, using a very large amount of memory.
> Eventually runs out of memory.

Strange.  The owl:sameAs reasoning can be hugely expensive (it is
fundamentally exponential) but the RDFS reasoner knows nothing about
owl:sameAs so isn't doing any of that reasoning.

I suspect a different problem rather than inference.

> I am at this point about to remove all dependencies on inference and
> use a plain model. Is there something I should learn from this about
> what inference is and isn't useful for?

Not from this example.

Dave