You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Philipp Singer <ki...@gmail.com> on 2012/08/24 11:30:29 UTC
Semantic grounding with Jena (Yago)
Hey guys!
I am currently working on Wikipedia and have calculated semantic related
other Wiki pages based on one given Wiki page. I now want to evaluate my
calculations based on a gold standard ontology. My problem now is that
it's my first time working with ontologies and I am stuck.
My first intuition was to use YAGO, DBPedia or something like Cyc. I
have now started to use Yago. I used Java's Jena to load the TDB based
directory.
My goal now is to find the shortest path between two Wikipedia pages as
some kind of evaluation. I thought of using Jena's shortest path
implementation [1]. But I cant really figure out how to exactly do it.
I have loaded Yago to Jena the following way:
Model m = TDBFactory.createDataset(args[0]).getDefaultModel(); OntModel
o = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM, m);
Let's suppose I have calculated to the given page "Alabama" a semantic
related page called "United_States" and I want to evaluate it. If I
browse Yago on their website [2] I see that it is located in United States.
I get the resources the following way:
Resource alabama =
o.getOntResource("http://yago-knowledge.org/resource/Alabama"); Resource
us = o.getOntResource("http://yago-knowledge.org/resource/United_States");
So the shortest path should be 1. However when I try out the algorithm
in Jena it doesn't work.
I then tried to put out the property "isLocatedIn" of the United States
just to get a feeling for it. And I got for example following output for
S --> P --> O:
http://yago-knowledge.org/resource/United_Stateshttp://yago-knowledge.org/resource/isLocatedInIn_the_Money
Isn't that the wrong direction? In my intuition this would mean that the
US are located in "In_The_Money".
Finally I just tried out a Sparql query for "Alabama". For example I got:
http://yago-knowledge.org/resource/Alabamahttp://yago-knowledge.org/resource/isLocatedIn"Hanceville,_Alabama"^^http://www.w3.org/2001/XMLSchema#string
That's again the wrong direction somehow. It's also just a string. How
could I use such things for the shortest path thing?
I am just really, really confused overall. I hope I have pointed out my
problems somehow understandable ;)
Regards, Philipp
[1]http://jena.apache.org/documentation/javadoc/jena/com/hp/hpl/jena/ontology/OntTools.html
[2]https://d5gate.ag5.mpi-sb.mpg.de/webyagospotlx/Browser?entity=Alabama
Re: Semantic grounding with Jena (Yago)
Posted by Dave Reynolds <da...@gmail.com>.
On 24/08/12 10:30, Philipp Singer wrote:
> Hey guys!
>
> I am currently working on Wikipedia and have calculated semantic related
> other Wiki pages based on one given Wiki page. I now want to evaluate my
> calculations based on a gold standard ontology. My problem now is that
> it's my first time working with ontologies and I am stuck.
I think your problem is this particular dataset rather than ontologies
in general or Jena specifically. You need to contact the folks who
developed the data.
> My first intuition was to use YAGO, DBPedia or something like Cyc. I
> have now started to use Yago. I used Java's Jena to load the TDB based
> directory.
>
> My goal now is to find the shortest path between two Wikipedia pages as
> some kind of evaluation. I thought of using Jena's shortest path
> implementation [1]. But I cant really figure out how to exactly do it.
>
> I have loaded Yago to Jena the following way:
>
> Model m = TDBFactory.createDataset(args[0]).getDefaultModel(); OntModel
> o = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM, m);
>
> Let's suppose I have calculated to the given page "Alabama" a semantic
> related page called "United_States" and I want to evaluate it. If I
> browse Yago on their website [2] I see that it is located in United States.
Are you sure which version of Yago you have? I believe there have been
several versions with substantial differences, as well as several
different downloadable slices for each version. For example if I resolve
the URI:
http://yago-knowledge.org/resource/Alabama
Then I get a Virtuoso endpoint which seems to only have reified facts,
no direct facts. and I couldn't spot a reified fact about isLocatedIn.
Though most requests were giving server errors so it's hard to be sure.
Certainly doesn't look like the diagram in [2] which is Yago2.
> I get the resources the following way:
>
> Resource alabama =
> o.getOntResource("http://yago-knowledge.org/resource/Alabama"); Resource
> us = o.getOntResource("http://yago-knowledge.org/resource/United_States");
>
> So the shortest path should be 1. However when I try out the algorithm
> in Jena it doesn't work.
Data problem.
Looking at the "YAGO2 core: N3 format" dump from [1], which omits the
reified facts, then that has 732 statements of the form:
<Alabama> y:isLocatedIn ?x .
But for all of those ?x is a simple text literal and none of those texts
contains "United" States. So there's no resource links at all in at
least that particular case.
> I then tried to put out the property "isLocatedIn" of the United States
> just to get a feeling for it. And I got for example following output for
> S --> P --> O:
>
> http://yago-knowledge.org/resource/United_Stateshttp://yago-knowledge.org/resource/isLocatedInIn_the_Money
>
>
> Isn't that the wrong direction? In my intuition this would mean that the
> US are located in "In_The_Money".
I would agree but the only way to tell is to find the ontology. If I
dereference:
http://yago-knowledge.org/resource/isLocatedIn
then I don't get anything intelligible back so that doesn't help.
> Finally I just tried out a Sparql query for "Alabama". For example I got:
>
> http://yago-knowledge.org/resource/Alabamahttp://yago-knowledge.org/resource/isLocatedIn"Hanceville,_Alabama"^^http://www.w3.org/2001/XMLSchema#string
>
>
> That's again the wrong direction somehow. It's also just a string. How
> could I use such things for the shortest path thing?
You can't. That's what's in the data and the data doesn't appear to
include isLocatedIn links between resources. You need a different data
set or a different version of this data set.
Dave
[1] http://www.mpi-inf.mpg.de/yago-naga/yago/downloads.html