You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Philipp Singer <ki...@gmail.com> on 2012/08/24 11:30:29 UTC

Semantic grounding with Jena (Yago)

Hey guys!

I am currently working on Wikipedia and have calculated semantic related 
other Wiki pages based on one given Wiki page. I now want to evaluate my 
calculations based on a gold standard ontology. My problem now is that 
it's my first time working with ontologies and I am stuck.

My first intuition was to use YAGO, DBPedia or something like Cyc. I 
have now started to use Yago. I used Java's Jena to load the TDB based 
directory.

My goal now is to find the shortest path between two Wikipedia pages as 
some kind of evaluation. I thought of using Jena's shortest path 
implementation [1]. But I cant really figure out how to exactly do it.

I have loaded Yago to Jena the following way:

Model m = TDBFactory.createDataset(args[0]).getDefaultModel(); OntModel 
o = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM, m);

Let's suppose I have calculated to the given page "Alabama" a semantic 
related page called "United_States" and I want to evaluate it. If I 
browse Yago on their website [2] I see that it is located in United States.

I get the resources the following way:

Resource alabama = 
o.getOntResource("http://yago-knowledge.org/resource/Alabama"); Resource 
us = o.getOntResource("http://yago-knowledge.org/resource/United_States");

So the shortest path should be 1. However when I try out the algorithm 
in Jena it doesn't work.

I then tried to put out the property "isLocatedIn" of the United States 
just to get a feeling for it. And I got for example following output for 
S --> P --> O:

http://yago-knowledge.org/resource/United_Stateshttp://yago-knowledge.org/resource/isLocatedInIn_the_Money

Isn't that the wrong direction? In my intuition this would mean that the 
US are located in "In_The_Money".

Finally I just tried out a Sparql query for "Alabama". For example I got:

http://yago-knowledge.org/resource/Alabamahttp://yago-knowledge.org/resource/isLocatedIn"Hanceville,_Alabama"^^http://www.w3.org/2001/XMLSchema#string

That's again the wrong direction somehow. It's also just a string. How 
could I use such things for the shortest path thing?

I am just really, really confused overall. I hope I have pointed out my 
problems somehow understandable ;)

Regards, Philipp

[1]http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/ontology/OntTools.html
[2]https://d5gate.ag5.mpi-sb.mpg.de/webyagospotlx/Browser?entity=Alabama


Re: Semantic grounding with Jena (Yago)

Posted by Dave Reynolds <da...@gmail.com>.
On 24/08/12 10:30, Philipp Singer wrote:
> Hey guys!
>
> I am currently working on Wikipedia and have calculated semantic related
> other Wiki pages based on one given Wiki page. I now want to evaluate my
> calculations based on a gold standard ontology. My problem now is that
> it's my first time working with ontologies and I am stuck.

I think your problem is this particular dataset rather than ontologies 
in general or Jena specifically. You need to contact the folks who 
developed the data.

> My first intuition was to use YAGO, DBPedia or something like Cyc. I
> have now started to use Yago. I used Java's Jena to load the TDB based
> directory.
>
> My goal now is to find the shortest path between two Wikipedia pages as
> some kind of evaluation. I thought of using Jena's shortest path
> implementation [1]. But I cant really figure out how to exactly do it.
>
> I have loaded Yago to Jena the following way:
>
> Model m = TDBFactory.createDataset(args[0]).getDefaultModel(); OntModel
> o = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM, m);
>
> Let's suppose I have calculated to the given page "Alabama" a semantic
> related page called "United_States" and I want to evaluate it. If I
> browse Yago on their website [2] I see that it is located in United States.

Are you sure which version of Yago you have?  I believe there have been 
several versions with substantial differences, as well as several 
different downloadable slices for each version. For example if I resolve 
the URI:

    http://yago-knowledge.org/resource/Alabama

Then I get a Virtuoso endpoint which seems to only have reified facts, 
no direct facts. and I couldn't spot a reified fact about isLocatedIn. 
Though most requests were giving server errors so it's hard to be sure.

Certainly doesn't look like the diagram in [2] which is Yago2.

> I get the resources the following way:
>
> Resource alabama =
> o.getOntResource("http://yago-knowledge.org/resource/Alabama"); Resource
> us = o.getOntResource("http://yago-knowledge.org/resource/United_States");
>
> So the shortest path should be 1. However when I try out the algorithm
> in Jena it doesn't work.

Data problem.

Looking at the "YAGO2 core: N3 format" dump from [1], which omits the 
reified facts, then that has 732 statements of the form:

    <Alabama> y:isLocatedIn ?x .

But for all of those ?x is a simple text literal and none of those texts 
contains "United" States. So there's no resource links at all in at 
least that particular case.

> I then tried to put out the property "isLocatedIn" of the United States
> just to get a feeling for it. And I got for example following output for
> S --> P --> O:
>
> http://yago-knowledge.org/resource/United_Stateshttp://yago-knowledge.org/resource/isLocatedInIn_the_Money
>
>
> Isn't that the wrong direction? In my intuition this would mean that the
> US are located in "In_The_Money".

I would agree but the only way to tell is to find the ontology. If I 
dereference:

    http://yago-knowledge.org/resource/isLocatedIn

then I don't get anything intelligible back so that doesn't help.

> Finally I just tried out a Sparql query for "Alabama". For example I got:
>
> http://yago-knowledge.org/resource/Alabamahttp://yago-knowledge.org/resource/isLocatedIn"Hanceville,_Alabama"^^http://www.w3.org/2001/XMLSchema#string
>
>
> That's again the wrong direction somehow. It's also just a string. How
> could I use such things for the shortest path thing?

You can't. That's what's in the data and the data doesn't appear to 
include isLocatedIn links between resources. You need a different data 
set or a different version of this data set.

Dave

[1] http://www.mpi-inf.mpg.de/yago-naga/yago/downloads.html