You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Benson Margulies <bi...@gmail.com> on 2010/12/09 15:11:27 UTC

I think I've got some pretty basic confusion with reading into models

Here's a little RDF snippet:

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rex="http://www.basistech.com/ontologies/2010/6/rex.owl#"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
  <rex:Location rdf:about="uri:c6c54ebb-a232-48cd-80fa-e4adf9cc5001#3">
    <rex:hasOriginalText rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
    >Pakistan</rex:hasOriginalText>
    <rex:hasNormalizedText
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
    >Pakistan</rex:hasNormalizedText>
    <rex:hasEntityDetectionSource
rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
    >gazetteer:../../../bt_root/rlp/rlp/dicts/en-all-gazetteer-LE.bin</rex:hasEntityDetectionSource>
  </rex:Location>
</rdf:RDF>

The following rather trivial test fails. I'm sure I'm just befuddled,
could some kind soul rescue me?

        OntModel model = ModelFactory.createOntologyModel();
        model.read(rdfStream, "", "RDF/XML-ABBREV");
        ExtendedIterator<Individual> individuals = model.listIndividuals();
        int individualCount = 0;
        while (individuals.hasNext()) {
            individualCount++;
        }
        assertTrue("Should have more than two individuals, but got " +
individualCount, individualCount > 0);

Re: I think I've got some pretty basic confusion with reading into models

Posted by Dave Reynolds <da...@gmail.com>.
On Thu, 2010-12-09 at 18:34 -0500, Benson Margulies wrote: 
> Yes. Now I'm 'merely' a bit fuddled about baseURI.
> 
> I want the about of the Ontology element to be "". 

That's just a relative URI, relative to the base of the document.

> To get this, the
> reading suggests that I need to pass the URI that is the xml:base URI
> of the entire document. I don't see any way to set a base URI of the
> live model, but I do see a way to specify one at the time I write out
> a model. Is that the plan? Pick a URI, pass it in to the
> createOntology call, and then later pass it to the writer as base? 

Yes. URIs internally in Jena are all absolute. You can read/write them
as relative to some base.

> Is
> it important for it to be unique?

Generally, yes.

By creating an Ontology element you are declaring a URI for the whole
ontology which, by convention but not necessity, is the URL for the
document in which the ontology[*] is published.

As I mentioned, this whole Ontology element/imports stuff in OWL is a
pretty document centric view.

There is no fundamental reason you can't have different graphs with
different content but the same Ontology URI in each. But you are
implicitly saying that they each represent a document with the same URI
which is probably not what you mean.

Cheers,
Dave

[*] Note that in this OWL view of the world (as opposed to most of RDF)
your instance data is an ontology that includes both your individuals
(so called "abox") and the bits of the ontology you wrote with Protege
and imported (so called "tbox").


> 
> On Thu, Dec 9, 2010 at 6:20 PM, Dave Reynolds <da...@gmail.com> wrote:
> > On 09/12/2010 22:44, Benson Margulies wrote:
> >>
> >> "Add the appropriate import"
> >>
> >> is that 'addLoadedImport' or something else I'm missing?
> >
> > Probably not.
> >
> > There are a couple of different notions here and the answer depends on
> > exactly how you want this to work :)
> >
> > First, thinking of your graph as an OWL document then there is the notion of
> > adding an import statement into the document. This means adding statements
> > to the graph (that "clutter" is one of the reasons I tend to avoid explicit
> > imports in internal processing chains). You can do that using
> > OntModel#createOntology to create a resource of type owl:Ontology
> > corresponding to this graph and then use Ontology#addImport to add the
> > import statement to that ontology resource.
> >
> > Second, there's the processing that means an OntModel "sees" all the
> > statements in the ontology. If you just load in a model with an import
> > statement (such as created by the above) then the OntModel will by default
> > so look for that Ontology, read it in, and add it as a subModel to the
> > OntModel.
> >
> > Now you can programmatically add a subModel anyway without having an import
> > statement by just calling OntModel#addSubModel. This can be useful when you
> > are doing some processing on a graph and want to see it along with its
> > ontology closure but don't want to mess with the base graph itself by adding
> > import statements.
> >
> > Sometimes you do this addSubModel yourself but later on want to do some
> > import processing anyway.  OntModel#addLoadedImport is there for that case I
> > believe, it records that an import has been done somehow. Is that right Ian?
> >
> > Does that all make sense?
> >
> > Dave
> >
> >




Re: I think I've got some pretty basic confusion with reading into models

Posted by Benson Margulies <bi...@gmail.com>.
Yes. Now I'm 'merely' a bit fuddled about baseURI.

I want the about of the Ontology element to be "". To get this, the
reading suggests that I need to pass the URI that is the xml:base URI
of the entire document. I don't see any way to set a base URI of the
live model, but I do see a way to specify one at the time I write out
a model. Is that the plan? Pick a URI, pass it in to the
createOntology call, and then later pass it to the writer as base? Is
it important for it to be unique?


On Thu, Dec 9, 2010 at 6:20 PM, Dave Reynolds <da...@gmail.com> wrote:
> On 09/12/2010 22:44, Benson Margulies wrote:
>>
>> "Add the appropriate import"
>>
>> is that 'addLoadedImport' or something else I'm missing?
>
> Probably not.
>
> There are a couple of different notions here and the answer depends on
> exactly how you want this to work :)
>
> First, thinking of your graph as an OWL document then there is the notion of
> adding an import statement into the document. This means adding statements
> to the graph (that "clutter" is one of the reasons I tend to avoid explicit
> imports in internal processing chains). You can do that using
> OntModel#createOntology to create a resource of type owl:Ontology
> corresponding to this graph and then use Ontology#addImport to add the
> import statement to that ontology resource.
>
> Second, there's the processing that means an OntModel "sees" all the
> statements in the ontology. If you just load in a model with an import
> statement (such as created by the above) then the OntModel will by default
> so look for that Ontology, read it in, and add it as a subModel to the
> OntModel.
>
> Now you can programmatically add a subModel anyway without having an import
> statement by just calling OntModel#addSubModel. This can be useful when you
> are doing some processing on a graph and want to see it along with its
> ontology closure but don't want to mess with the base graph itself by adding
> import statements.
>
> Sometimes you do this addSubModel yourself but later on want to do some
> import processing anyway.  OntModel#addLoadedImport is there for that case I
> believe, it records that an import has been done somehow. Is that right Ian?
>
> Does that all make sense?
>
> Dave
>
>

Re: I think I've got some pretty basic confusion with reading into models

Posted by Dave Reynolds <da...@gmail.com>.
On 09/12/2010 22:44, Benson Margulies wrote:
> "Add the appropriate import"
>
> is that 'addLoadedImport' or something else I'm missing?

Probably not.

There are a couple of different notions here and the answer depends on 
exactly how you want this to work :)

First, thinking of your graph as an OWL document then there is the 
notion of adding an import statement into the document. This means 
adding statements to the graph (that "clutter" is one of the reasons I 
tend to avoid explicit imports in internal processing chains). You can 
do that using OntModel#createOntology to create a resource of type 
owl:Ontology corresponding to this graph and then use Ontology#addImport 
to add the import statement to that ontology resource.

Second, there's the processing that means an OntModel "sees" all the 
statements in the ontology. If you just load in a model with an import 
statement (such as created by the above) then the OntModel will by 
default so look for that Ontology, read it in, and add it as a subModel 
to the OntModel.

Now you can programmatically add a subModel anyway without having an 
import statement by just calling OntModel#addSubModel. This can be 
useful when you are doing some processing on a graph and want to see it 
along with its ontology closure but don't want to mess with the base 
graph itself by adding import statements.

Sometimes you do this addSubModel yourself but later on want to do some 
import processing anyway.  OntModel#addLoadedImport is there for that 
case I believe, it records that an import has been done somehow. Is that 
right Ian?

Does that all make sense?

Dave


Re: I think I've got some pretty basic confusion with reading into models

Posted by Benson Margulies <bi...@gmail.com>.
"Add the appropriate import"

is that 'addLoadedImport' or something else I'm missing?

Re: I think I've got some pretty basic confusion with reading into models

Posted by Benson Margulies <bi...@gmail.com>.
That's a great idea. Thanks.

On Thu, Dec 9, 2010 at 1:07 PM, Dave Reynolds <da...@gmail.com> wrote:
> Hi Benson,
>
> I'll reply properly later (deadlines looming) but a quick thought ...
> have you looked at the RDF coming out of OpenCalais and/or the TSO
> Document Enrichment Service?
>
> I can't offhand recall to what extent they use OWL, whether they are DL
> compliant and whether they reference their ontologies using explicit
> owl:imports.  But it might be worth a quick look at those to see how
> others in a similar space are handling it.
>
> Cheers,
> Dave
>
> On Thu, 2010-12-09 at 12:57 -0500, Benson Margulies wrote:
>> Dave,
>>
>> Permit me to take this up one more conceptual level.
>>
>> So, here at Basis we have a named entity extractor, plus we have been
>> dabbling in JAPE rules to build some relationship extraction, and we
>> have a coref system coming on line.
>>
>> We want to represent the output of these things in RDF, and then do
>> 'interesting' queries in the RDF we come up with, and we eventually
>> want to extend to querying dbpedia.
>>
>> I read a few books and tutorials and concluded that it made sense to
>> work OWL-ishly instead of with naked RDF. I flirted with Proton, but
>> decided for now to use my own little ontology.
>>
>> The flow is that one process does all this NLP and derives RDF/OWL,
>> one graph per source. The second process takes these graphs and wants
>> to stuff them into a store. And the third will do queries and
>> visualization. On another thread I'm borrowing Andy's neurons on the
>> subject of choosing a tuple store.
>>
>> After reading your messages, my thought is that I need to add the
>> import into 'process 1', I don't really need any model in 'process 2'
>> if I'm just pushing RDF/XML from here to there, and that in process 3
>> the big question is to pick a store.
>>
>> Can you give me a pointer to read up to conform to OWL/DL, which from
>> your email seems like it's what I'm stumbling toward doing?
>>
>> Or do you care to give me a shove in some other direction altogether?
>>
>> thanks,
>> benson
>
>
>
>

Re: I think I've got some pretty basic confusion with reading into models

Posted by Dave Reynolds <da...@gmail.com>.
Hi Benson,

I'll reply properly later (deadlines looming) but a quick thought ...
have you looked at the RDF coming out of OpenCalais and/or the TSO
Document Enrichment Service?

I can't offhand recall to what extent they use OWL, whether they are DL
compliant and whether they reference their ontologies using explicit
owl:imports.  But it might be worth a quick look at those to see how
others in a similar space are handling it.

Cheers,
Dave

On Thu, 2010-12-09 at 12:57 -0500, Benson Margulies wrote: 
> Dave,
> 
> Permit me to take this up one more conceptual level.
> 
> So, here at Basis we have a named entity extractor, plus we have been
> dabbling in JAPE rules to build some relationship extraction, and we
> have a coref system coming on line.
> 
> We want to represent the output of these things in RDF, and then do
> 'interesting' queries in the RDF we come up with, and we eventually
> want to extend to querying dbpedia.
> 
> I read a few books and tutorials and concluded that it made sense to
> work OWL-ishly instead of with naked RDF. I flirted with Proton, but
> decided for now to use my own little ontology.
> 
> The flow is that one process does all this NLP and derives RDF/OWL,
> one graph per source. The second process takes these graphs and wants
> to stuff them into a store. And the third will do queries and
> visualization. On another thread I'm borrowing Andy's neurons on the
> subject of choosing a tuple store.
> 
> After reading your messages, my thought is that I need to add the
> import into 'process 1', I don't really need any model in 'process 2'
> if I'm just pushing RDF/XML from here to there, and that in process 3
> the big question is to pick a store.
> 
> Can you give me a pointer to read up to conform to OWL/DL, which from
> your email seems like it's what I'm stumbling toward doing?
> 
> Or do you care to give me a shove in some other direction altogether?
> 
> thanks,
> benson




Re: I think I've got some pretty basic confusion with reading into models

Posted by Dave Reynolds <da...@gmail.com>.
Hi Benson,

Given that sketch of the process I don't see any particular advantage in 
explicitly importing the ontology into each graph. It would certainly 
not be harmful but doesn't seem necessary.

There doesn't seem to be any inference involved (the primary reason to 
worry about DL conformance). Process 1 can generate a bare graph, 
process 2 can store it and proces 3 can query and visualize the data 
without needing the ontology import. If you want to publish the data on 
the web as an OWL file then adding an import would be reasonable but can 
be done as part of the publication step and doesn't have to be 
maintained through the internal chain.

The small reasons to not include the imports during the internal process 
chain are:

(1) When you create OntModels over the graphs the ontology will be need 
to be loaded in from somewhere and will be included in queries over it 
(that's the point of import). This is an overhead and has the potential 
to cause problems if the link to the imported ontology breaks.

(2) The whole notion of imports is a document-oriented notion. It sounds 
like you do have document-centric processing mode with a separate graph 
for each processed source document. However, you may in the future be 
merging graphs and ending up with multiple ontology declarations in the 
merged graphcs can be unhelpful clutter.

So six of one, half a dozen of the other, YMMV.

FWIW in almost all applications I've developed I've not included 
explicit import statements in internal graph stores or procesing chains 
but have sometimes inserted them in published output.

Dave


On 09/12/2010 17:57, Benson Margulies wrote:
> Dave,
>
> Permit me to take this up one more conceptual level.
>
> So, here at Basis we have a named entity extractor, plus we have been
> dabbling in JAPE rules to build some relationship extraction, and we
> have a coref system coming on line.
>
> We want to represent the output of these things in RDF, and then do
> 'interesting' queries in the RDF we come up with, and we eventually
> want to extend to querying dbpedia.
>
> I read a few books and tutorials and concluded that it made sense to
> work OWL-ishly instead of with naked RDF. I flirted with Proton, but
> decided for now to use my own little ontology.
>
> The flow is that one process does all this NLP and derives RDF/OWL,
> one graph per source. The second process takes these graphs and wants
> to stuff them into a store. And the third will do queries and
> visualization. On another thread I'm borrowing Andy's neurons on the
> subject of choosing a tuple store.
>
> After reading your messages, my thought is that I need to add the
> import into 'process 1', I don't really need any model in 'process 2'
> if I'm just pushing RDF/XML from here to there, and that in process 3
> the big question is to pick a store.
>
> Can you give me a pointer to read up to conform to OWL/DL, which from
> your email seems like it's what I'm stumbling toward doing?
>
> Or do you care to give me a shove in some other direction altogether?
>
> thanks,
> benson


Re: I think I've got some pretty basic confusion with reading into models

Posted by Benson Margulies <bi...@gmail.com>.
Dave,

Permit me to take this up one more conceptual level.

So, here at Basis we have a named entity extractor, plus we have been
dabbling in JAPE rules to build some relationship extraction, and we
have a coref system coming on line.

We want to represent the output of these things in RDF, and then do
'interesting' queries in the RDF we come up with, and we eventually
want to extend to querying dbpedia.

I read a few books and tutorials and concluded that it made sense to
work OWL-ishly instead of with naked RDF. I flirted with Proton, but
decided for now to use my own little ontology.

The flow is that one process does all this NLP and derives RDF/OWL,
one graph per source. The second process takes these graphs and wants
to stuff them into a store. And the third will do queries and
visualization. On another thread I'm borrowing Andy's neurons on the
subject of choosing a tuple store.

After reading your messages, my thought is that I need to add the
import into 'process 1', I don't really need any model in 'process 2'
if I'm just pushing RDF/XML from here to there, and that in process 3
the big question is to pick a store.

Can you give me a pointer to read up to conform to OWL/DL, which from
your email seems like it's what I'm stumbling toward doing?

Or do you care to give me a shove in some other direction altogether?

thanks,
benson

Re: I think I've got some pretty basic confusion with reading into models

Posted by Dave Reynolds <da...@gmail.com>.
On Thu, 2010-12-09 at 11:51 -0500, Benson Margulies wrote: 
> Dave,
> 
> Thanks.
> 
> I think that I may have still failed to give you enough information to
> reason about my level of confusion, so I have a few followup
> questions.
> 
> 1.  I think I'm confused about 'Things'. My ontology doesn't mention
> Thing anywhere explicitly. I have owl:Class elements and
> owl:DatatypeProperty elements. When I create an individual with the
> OntModel, is it a thing _ex officio_?

Yes pretty much.

The notion of an "Individual" is something that makes the most sense in
OWL DL.

In RDF and RDFS there are just resources, some of those resources are
also Classes and/or Properties but there's no separation.

In OWL DL there is a strict separation between Classes,
ObjectProperties, DatatypeProperties and Individuals [*].
In OWL DL all resources which are not Classes or Properties are
Individuals. All classes are subclasses of owl:Thing so if you declare
some class C and say:
   i rdf:type C
then it is possible to infer:
   i rdf:type owl:Thing.

> 2. When I set up my OntModel to create my individuals in the first
> place, should I be, somehow, attaching the URI that corresponds to my
> ontolology to it?

Depends what you are trying to do.

To work within strict OWL DL then to use something as a Class you have
to declare it as such either within the model or via an import.

You can achieve that by adding an appropriate import statement to the
OntModel when you create it and letting the OntDocument machinery load
in the ontology (as a sub-model). 

Though given your next point it doesn't sound like you don't want to do
that and strict DL is not your goal.

> 3. I was really _only_ expecting to see the rex: items and not the
> ontology itself. This leads me to ask: If my goal is just to take one
> of these documents and push it into Mulgara, doing no inference on the
> way, should I just be using a plain model and checking my work by
> looking at Resources instead of Individuals? I think that the answer
> is 'yes' based on your email, but I wanted to check.

Not necessarily, though that's certainly OK.

You can perfectly well use OntModels with no inference (see the
OntModelSpec argument in my example) and no imports. The OntModel API
mostly works happily for pure RDF, RDFS and OWL/Full too and has some
useful convenience methods. 

It is just this concept of Individual that is a bit confusing in those
cases.

So you could use the OntModel but fall back on just listing resources
(OntModels extend Models) for those cases, as in my example. Or indeed
use things like OntClass#listInstances  to get instances of specific
classes which I suspect is a more common usage.

> I am very grateful for your assistance in my process of bootstrapping
> myself into this whole areas, where, as you can tell, I am starting
> from a very empty graph.

No worries, the whole stack is certainly pretty complex and confusing
and any API like Jena has some compromises and legacy in it.

Dave

[*] I'm ignoring OWL2's "punning" here.


> 
> --benson
> 
> 
> 
> 
> 
> On Thu, Dec 9, 2010 at 11:04 AM, Dave Reynolds
> <da...@gmail.com> wrote:
> > Hi Benson,
> >
> > On Thu, 2010-12-09 at 09:11 -0500, Benson Margulies wrote:
> >> Here's a little RDF snippet:
> >>
> >> <rdf:RDF
> >>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> >>     xmlns:rex="http://www.basistech.com/ontologies/2010/6/rex.owl#"
> >>     xmlns:owl="http://www.w3.org/2002/07/owl#"
> >>     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
> >>     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
> >>   <rex:Location rdf:about="uri:c6c54ebb-a232-48cd-80fa-e4adf9cc5001#3">
> >>     <rex:hasOriginalText rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
> >>     >Pakistan</rex:hasOriginalText>
> >>     <rex:hasNormalizedText
> >> rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
> >>     >Pakistan</rex:hasNormalizedText>
> >>     <rex:hasEntityDetectionSource
> >> rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
> >>     >gazetteer:../../../bt_root/rlp/rlp/dicts/en-all-gazetteer-LE.bin</rex:hasEntityDetectionSource>
> >>   </rex:Location>
> >> </rdf:RDF>
> >>
> >> The following rather trivial test fails. I'm sure I'm just befuddled,
> >> could some kind soul rescue me?
> >>
> >>         OntModel model = ModelFactory.createOntologyModel();
> >>         model.read(rdfStream, "", "RDF/XML-ABBREV");
> >>         ExtendedIterator<Individual> individuals = model.listIndividuals();
> >>         int individualCount = 0;
> >>         while (individuals.hasNext()) {
> >>             individualCount++;
> >>         }
> >>         assertTrue("Should have more than two individuals, but got " +
> >> individualCount, individualCount > 0);
> >
> > Actually that's a tricky area :)
> >
> > For OWL then we treat an individual as being anything of type
> > owl:Thing.
> >
> > If you have an OWL-capable reasoner configured for the OntModel then the
> > code can just query for that directly. If you don't then it looks for
> > all resources whose declared type is itself an owl:Class (and not part
> > of the OWL/RDFS "furniture").
> >
> > Since you don't have a declaration of rex:Location anywhere then nothing
> > bites.
> >
> > Since your test comment suggest you are expecting at least 2 then I'm
> > guessing you mean to count rex:Location as well so you are probably just
> > after resources mentioned rather than Individuals in the OWL sense. In
> > that case try:
> >
> >       OntModel model =
> > ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);
> >        model.read("file:data/temp2.rdf", "", "RDF/XML-ABBREV");
> >
> >        ExtendedIterator<RDFNode> nodes =
> > model.listObjects().andThen( model.listSubjects() );
> >        int count = 0;
> >        while ( nodes.hasNext() ) {
> >            if (nodes.next().isResource()) count ++;
> >        }
> >
> >
> > Note the explicit OntModelSpec to turn off the default RDFS reasoner
> > which would otherwise complicate things.
> >
> >
> > Actually checking this out revealed a possible bug in this area. RDFS
> > reasoning can infer that rex:Location is a rdfs:Class so if you set the
> > OntModel profile to RDFS then uri:c6c54ebb-a232-48cd-80fa-e4adf9cc5001#3
> > is treated as an individual. The trouble is that a lot of stuff in the
> > background RDFS knowledge are also treated as Individuals (28 of them
> > including things that are rdfs:Class/rdf:Property). While technically
> > correct (because in RDFS all resources are Individuals) I would have
> > expected the OntModel conventions to filter those out. Ian will log a
> > JIRA case the check this out (assuming he can get access to JIRA).
> >
> > Dave
> >
> >
> >




Re: I think I've got some pretty basic confusion with reading into models

Posted by Benson Margulies <bi...@gmail.com>.
Dave,

Thanks.

I think that I may have still failed to give you enough information to
reason about my level of confusion, so I have a few followup
questions.

1.  I think I'm confused about 'Things'. My ontology doesn't mention
Thing anywhere explicitly. I have owl:Class elements and
owl:DatatypeProperty elements. When I create an individual with the
OntModel, is it a thing _ex officio_?

2. When I set up my OntModel to create my individuals in the first
place, should I be, somehow, attaching the URI that corresponds to my
ontolology to it?

3. I was really _only_ expecting to see the rex: items and not the
ontology itself. This leads me to ask: If my goal is just to take one
of these documents and push it into Mulgara, doing no inference on the
way, should I just be using a plain model and checking my work by
looking at Resources instead of Individuals? I think that the answer
is 'yes' based on your email, but I wanted to check.

I am very grateful for your assistance in my process of bootstrapping
myself into this whole areas, where, as you can tell, I am starting
from a very empty graph.

--benson





On Thu, Dec 9, 2010 at 11:04 AM, Dave Reynolds
<da...@gmail.com> wrote:
> Hi Benson,
>
> On Thu, 2010-12-09 at 09:11 -0500, Benson Margulies wrote:
>> Here's a little RDF snippet:
>>
>> <rdf:RDF
>>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>>     xmlns:rex="http://www.basistech.com/ontologies/2010/6/rex.owl#"
>>     xmlns:owl="http://www.w3.org/2002/07/owl#"
>>     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
>>     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
>>   <rex:Location rdf:about="uri:c6c54ebb-a232-48cd-80fa-e4adf9cc5001#3">
>>     <rex:hasOriginalText rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>>     >Pakistan</rex:hasOriginalText>
>>     <rex:hasNormalizedText
>> rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>>     >Pakistan</rex:hasNormalizedText>
>>     <rex:hasEntityDetectionSource
>> rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>>     >gazetteer:../../../bt_root/rlp/rlp/dicts/en-all-gazetteer-LE.bin</rex:hasEntityDetectionSource>
>>   </rex:Location>
>> </rdf:RDF>
>>
>> The following rather trivial test fails. I'm sure I'm just befuddled,
>> could some kind soul rescue me?
>>
>>         OntModel model = ModelFactory.createOntologyModel();
>>         model.read(rdfStream, "", "RDF/XML-ABBREV");
>>         ExtendedIterator<Individual> individuals = model.listIndividuals();
>>         int individualCount = 0;
>>         while (individuals.hasNext()) {
>>             individualCount++;
>>         }
>>         assertTrue("Should have more than two individuals, but got " +
>> individualCount, individualCount > 0);
>
> Actually that's a tricky area :)
>
> For OWL then we treat an individual as being anything of type
> owl:Thing.
>
> If you have an OWL-capable reasoner configured for the OntModel then the
> code can just query for that directly. If you don't then it looks for
> all resources whose declared type is itself an owl:Class (and not part
> of the OWL/RDFS "furniture").
>
> Since you don't have a declaration of rex:Location anywhere then nothing
> bites.
>
> Since your test comment suggest you are expecting at least 2 then I'm
> guessing you mean to count rex:Location as well so you are probably just
> after resources mentioned rather than Individuals in the OWL sense. In
> that case try:
>
>       OntModel model =
> ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);
>        model.read("file:data/temp2.rdf", "", "RDF/XML-ABBREV");
>
>        ExtendedIterator<RDFNode> nodes =
> model.listObjects().andThen( model.listSubjects() );
>        int count = 0;
>        while ( nodes.hasNext() ) {
>            if (nodes.next().isResource()) count ++;
>        }
>
>
> Note the explicit OntModelSpec to turn off the default RDFS reasoner
> which would otherwise complicate things.
>
>
> Actually checking this out revealed a possible bug in this area. RDFS
> reasoning can infer that rex:Location is a rdfs:Class so if you set the
> OntModel profile to RDFS then uri:c6c54ebb-a232-48cd-80fa-e4adf9cc5001#3
> is treated as an individual. The trouble is that a lot of stuff in the
> background RDFS knowledge are also treated as Individuals (28 of them
> including things that are rdfs:Class/rdf:Property). While technically
> correct (because in RDFS all resources are Individuals) I would have
> expected the OntModel conventions to filter those out. Ian will log a
> JIRA case the check this out (assuming he can get access to JIRA).
>
> Dave
>
>
>

Re: I think I've got some pretty basic confusion with reading into models

Posted by Dave Reynolds <da...@gmail.com>.
Hi Benson,

On Thu, 2010-12-09 at 09:11 -0500, Benson Margulies wrote: 
> Here's a little RDF snippet:
> 
> <rdf:RDF
>     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>     xmlns:rex="http://www.basistech.com/ontologies/2010/6/rex.owl#"
>     xmlns:owl="http://www.w3.org/2002/07/owl#"
>     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
>     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">
>   <rex:Location rdf:about="uri:c6c54ebb-a232-48cd-80fa-e4adf9cc5001#3">
>     <rex:hasOriginalText rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>     >Pakistan</rex:hasOriginalText>
>     <rex:hasNormalizedText
> rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>     >Pakistan</rex:hasNormalizedText>
>     <rex:hasEntityDetectionSource
> rdf:datatype="http://www.w3.org/2001/XMLSchema#string"
>     >gazetteer:../../../bt_root/rlp/rlp/dicts/en-all-gazetteer-LE.bin</rex:hasEntityDetectionSource>
>   </rex:Location>
> </rdf:RDF>
> 
> The following rather trivial test fails. I'm sure I'm just befuddled,
> could some kind soul rescue me?
> 
>         OntModel model = ModelFactory.createOntologyModel();
>         model.read(rdfStream, "", "RDF/XML-ABBREV");
>         ExtendedIterator<Individual> individuals = model.listIndividuals();
>         int individualCount = 0;
>         while (individuals.hasNext()) {
>             individualCount++;
>         }
>         assertTrue("Should have more than two individuals, but got " +
> individualCount, individualCount > 0);

Actually that's a tricky area :)

For OWL then we treat an individual as being anything of type
owl:Thing. 

If you have an OWL-capable reasoner configured for the OntModel then the
code can just query for that directly. If you don't then it looks for
all resources whose declared type is itself an owl:Class (and not part
of the OWL/RDFS "furniture").

Since you don't have a declaration of rex:Location anywhere then nothing
bites.

Since your test comment suggest you are expecting at least 2 then I'm
guessing you mean to count rex:Location as well so you are probably just
after resources mentioned rather than Individuals in the OWL sense. In
that case try:

       OntModel model =
ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);
        model.read("file:data/temp2.rdf", "", "RDF/XML-ABBREV");
        
        ExtendedIterator<RDFNode> nodes =
model.listObjects().andThen( model.listSubjects() );
        int count = 0;
        while ( nodes.hasNext() ) {
            if (nodes.next().isResource()) count ++;
        }


Note the explicit OntModelSpec to turn off the default RDFS reasoner
which would otherwise complicate things.


Actually checking this out revealed a possible bug in this area. RDFS
reasoning can infer that rex:Location is a rdfs:Class so if you set the
OntModel profile to RDFS then uri:c6c54ebb-a232-48cd-80fa-e4adf9cc5001#3
is treated as an individual. The trouble is that a lot of stuff in the
background RDFS knowledge are also treated as Individuals (28 of them
including things that are rdfs:Class/rdf:Property). While technically
correct (because in RDFS all resources are Individuals) I would have
expected the OntModel conventions to filter those out. Ian will log a
JIRA case the check this out (assuming he can get access to JIRA).

Dave