You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by m....@utwente.nl on 2012/03/08 15:37:30 UTC

Loading model into memory

Hi,
I want to apply the OnTools .FindShortestPath function on Yago.
I am using the following code to load the model:

Model model = TDBFactory.createModel(FullYagoDirectory);

The FindShortestPath function taking too much time to return a result.
I wonder if it is possible to load the model into main memory to make it faster or if there is any other way to make FindShortestPath much faster.

Thanks a lot

mena

----------------------------------------------------
Mena B. Habib
PhD Student

University of Twente
Faculty of Electrical Engineering, Mathematics and Computer Science.
Database Chair
7500AE Enschede, Netherlands

mail: m.b.habib@ewi.utwente.nl<ma...@ewi.utwente.nl>
website: http://wwwhome.ctit.utwente.nl/~badiehm/
Phone: +31 53 489 4549
Fax: +31 53 489 2927
Mobile: +31 68 183 2680


Re: Loading model into memory

Posted by Andy Seaborne <an...@apache.org>.
On 15/03/12 14:15, m.badiehhabibmorgan@utwente.nl wrote:
> Hi Andy To use ARQ 2.9.1 I need also to use TDB 0.9 In the new TDB,
> the function CreateModel(String dir) is no more existing.

CreateModel or createModel

TDBFactory.createModel() exists, but is deprecated.

> How to read
> a model from disc (not from memory) with the new versions?

Better to use a dataset and get the default model.


> Furthermore, we I use ARQ 2.9.1 I always got this error Exception in
> thread "main" java.lang.NoClassDefFoundError:
> org/apache/jena/iri/IRIFactory I am using
> jena-iri-0.9.0-incubating.jar Thanks a lot

You need jena-iri-0.9.1-incubating-SNAPSHOT with ARQ 
2.9.1incubating-SNAPSHOT as given by the POM.

We have seen some problems with maven not properly updating the snapshot 
dependencies - use mvn -U to force snapshots to be updated.

	Andy

>
> Mena

RE: Loading model into memory

Posted by m....@utwente.nl.
Hi Andy
To use ARQ 2.9.1 I need also to use TDB 0.9 In the new TDB, the function CreateModel(String dir) is no more existing. How to read a model from disc (not from memory) with the new versions?
Furthermore, we I use ARQ 2.9.1 I always got this error 
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/jena/iri/IRIFactory
I am using jena-iri-0.9.0-incubating.jar
Thanks a lot

Mena

----------------------------------------------------
Mena B. Habib
PhD Student

University of Twente 
Faculty of Electrical Engineering, Mathematics and Computer Science.
Database Chair
7500AE Enschede, Netherlands

mail: m.b.habib@ewi.utwente.nl
website: http://wwwhome.ctit.utwente.nl/~badiehm/
Phone: +31 53 489 4549
Fax: +31 53 489 2927
Mobile: +31 68 183 2680


-----Original Message-----
From: m.badiehhabibmorgan@utwente.nl [mailto:m.badiehhabibmorgan@utwente.nl] 
Sent: Thursday, March 15, 2012 2:08 PM
To: jena-users@incubator.apache.org
Subject: RE: Loading model into memory


----------------------------------------------------
Mena B. Habib
PhD Student

University of Twente
Faculty of Electrical Engineering, Mathematics and Computer Science.
Database Chair
7500AE Enschede, Netherlands

mail: m.b.habib@ewi.utwente.nl
website: http://wwwhome.ctit.utwente.nl/~badiehm/
Phone: +31 53 489 4549
Fax: +31 53 489 2927
Mobile: +31 68 183 2680


-----Original Message-----
From: Andy Seaborne [mailto:andy.seaborne.apache@gmail.com] On Behalf Of Andy Seaborne
Sent: Tuesday, March 13, 2012 1:44 PM
To: jena-users@incubator.apache.org
Subject: Re: Loading model into memory

On 13/03/12 12:07, m.badiehhabibmorgan@utwente.nl wrote:
> Hi Andy
>
> I tried this query :
> Query query = QueryFactory.create("SELECT * WHERE { 
> \"http://www.mpii.de/yago/resource/Alexandria\" DISTINCT(path) \"http://www.mpii.de/yago/resource/Egypt\" } "); But it gives me this error:
> Exception in thread "main" com.hp.hpl.jena.query.QueryParseException: Encountered " "distinct" "DISTINCT "" at line 1, column 64.
>
> I am using ARQ 2.9.0, Core 2.7.0,  TDB 0.9.0 I could not find ARQ
> 2.9.1

That query has a number of problems:

1/ DISTINCT(..) is only in the development build of ARQ 2.9.1 (location below).

2/ it's a syntax extension, you need  "create(..., Syntax.syntaxARQ)" to enable it

3/ You need to put a path expression  inside DISTINCT(...) It does not find arbitrary paths, it matches a path you give it.  You can't have variables there either. Not sure that YAGO has for properties.  A FOAF example is:

  ?s distinct(foaf:knows+) ?t

although for work-in-progress reasons is the same as

  ?s foaf:knows+ ?t

unlike ARQ 2.9.0

See also this email:
http://mail-archives.apache.org/mod_mbox/incubator-jena-users/201203.mbox/%3C4F5A6D38.5070204%40apache.org%3E

ARQ 2.9.1 development build:

https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-arq/2.9.1-incubating-SNAPSHOT/

>
> Thanks a lot
>
> Mena

	Andy

RE: Loading model into memory

Posted by m....@utwente.nl.
Hi Andy 
To use ARQ 2.9.0 I need also to use TDB 0.9
In the new TDB, the function CreateModel(String dir) is no more existing. How to read a model from disc (not from memory) with the new versions?

Thanks a lot

Mena

----------------------------------------------------
Mena B. Habib
PhD Student

University of Twente 
Faculty of Electrical Engineering, Mathematics and Computer Science.
Database Chair
7500AE Enschede, Netherlands

mail: m.b.habib@ewi.utwente.nl
website: http://wwwhome.ctit.utwente.nl/~badiehm/
Phone: +31 53 489 4549
Fax: +31 53 489 2927
Mobile: +31 68 183 2680


-----Original Message-----
From: Andy Seaborne [mailto:andy.seaborne.apache@gmail.com] On Behalf Of Andy Seaborne
Sent: Tuesday, March 13, 2012 1:44 PM
To: jena-users@incubator.apache.org
Subject: Re: Loading model into memory

On 13/03/12 12:07, m.badiehhabibmorgan@utwente.nl wrote:
> Hi Andy
>
> I tried this query :
> Query query = QueryFactory.create("SELECT * WHERE { 
> \"http://www.mpii.de/yago/resource/Alexandria\" DISTINCT(path) \"http://www.mpii.de/yago/resource/Egypt\" } "); But it gives me this error:
> Exception in thread "main" com.hp.hpl.jena.query.QueryParseException: Encountered " "distinct" "DISTINCT "" at line 1, column 64.
>
> I am using ARQ 2.9.0, Core 2.7.0,  TDB 0.9.0 I could not find ARQ 
> 2.9.1

That query has a number of problems:

1/ DISTINCT(..) is only in the development build of ARQ 2.9.1 (location below).

2/ it's a syntax extension, you need  "create(..., Syntax.syntaxARQ)" to enable it

3/ You need to put a path expression  inside DISTINCT(...) It does not find arbitrary paths, it matches a path you give it.  You can't have variables there either. Not sure that YAGO has for properties.  A FOAF example is:

  ?s distinct(foaf:knows+) ?t

although for work-in-progress reasons is the same as

  ?s foaf:knows+ ?t

unlike ARQ 2.9.0

See also this email:
http://mail-archives.apache.org/mod_mbox/incubator-jena-users/201203.mbox/%3C4F5A6D38.5070204%40apache.org%3E

ARQ 2.9.1 development build:

https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-arq/2.9.1-incubating-SNAPSHOT/

>
> Thanks a lot
>
> Mena

	Andy

Re: Loading model into memory

Posted by Andy Seaborne <an...@apache.org>.
On 13/03/12 12:07, m.badiehhabibmorgan@utwente.nl wrote:
> Hi Andy
>
> I tried this query :
> Query query = QueryFactory.create("SELECT * WHERE { \"http://www.mpii.de/yago/resource/Alexandria\" DISTINCT(path) \"http://www.mpii.de/yago/resource/Egypt\" } ");
> But it gives me this error:
> Exception in thread "main" com.hp.hpl.jena.query.QueryParseException: Encountered " "distinct" "DISTINCT "" at line 1, column 64.
>
> I am using ARQ 2.9.0, Core 2.7.0,  TDB 0.9.0
> I could not find ARQ 2.9.1

That query has a number of problems:

1/ DISTINCT(..) is only in the development build of ARQ 2.9.1 (location 
below).

2/ it's a syntax extension, you need  "create(..., Syntax.syntaxARQ)" to 
enable it

3/ You need to put a path expression  inside DISTINCT(...) It does not 
find arbitrary paths, it matches a path you give it.  You can't have 
variables there either. Not sure that YAGO has for properties.  A FOAF 
example is:

  ?s distinct(foaf:knows+) ?t

although for work-in-progress reasons is the same as

  ?s foaf:knows+ ?t

unlike ARQ 2.9.0

See also this email:
http://mail-archives.apache.org/mod_mbox/incubator-jena-users/201203.mbox/%3C4F5A6D38.5070204%40apache.org%3E

ARQ 2.9.1 development build:

https://repository.apache.org/content/repositories/snapshots/org/apache/jena/jena-arq/2.9.1-incubating-SNAPSHOT/

>
> Thanks a lot
>
> Mena

	Andy

RE: Loading model into memory

Posted by m....@utwente.nl.
Hi Andy

I tried this query :
Query query = QueryFactory.create("SELECT * WHERE { \"http://www.mpii.de/yago/resource/Alexandria\" DISTINCT(path) \"http://www.mpii.de/yago/resource/Egypt\" } ");

But it gives me this error: 
Exception in thread "main" com.hp.hpl.jena.query.QueryParseException: Encountered " "distinct" "DISTINCT "" at line 1, column 64.

I am using ARQ 2.9.0, Core 2.7.0,  TDB 0.9.0
I could not find ARQ 2.9.1

Thanks a lot

Mena

----------------------------------------------------
Mena B. Habib
PhD Student

University of Twente 
Faculty of Electrical Engineering, Mathematics and Computer Science.
Database Chair
7500AE Enschede, Netherlands

mail: m.b.habib@ewi.utwente.nl
website: http://wwwhome.ctit.utwente.nl/~badiehm/
Phone: +31 53 489 4549
Fax: +31 53 489 2927
Mobile: +31 68 183 2680


-----Original Message-----
From: Andy Seaborne [mailto:andy.seaborne.apache@gmail.com] On Behalf Of Andy Seaborne
Sent: Thursday, March 08, 2012 6:36 PM
To: jena-users@incubator.apache.org
Subject: Re: Loading model into memory

On 08/03/12 17:26, m.badiehhabibmorgan@utwente.nl wrote:
> It can be useful in this way. But how to run it using Jena library? Is there a function i can use to apply such query on a model?

You can make a SPARQL query or call the path evaluator PathEval directly.

You will need ARQ 2.9.1-incubating-SNAPSHOT from the Apache snapshot repository.

	Andy

>
> Thanks a lot
>
> Mena
> ________________________________________
> From: Andy Seaborne [andy.seaborne.apache@gmail.com] on behalf of Andy 
> Seaborne [andy@apache.org]
> Sent: Thursday, March 08, 2012 5:26 PM
> To: jena-users@incubator.apache.org
> Subject: Re: Loading model into memory
>
> On 08/03/12 16:06, m.badiehhabibmorgan@utwente.nl wrote:
>> What I want to do is to see how two entities are related .. I am using the shortest path as a measure for relatedness. I don't care about what kind of relations they had. This why I can't specify certain property.
>> I wonder how { :x DISTINCT(path) ?y } works? What is the output expected for this query? Is it the same as getting the shortest path from x to y?
>
> It's not the shortest path - in fact, it does not say what the path is 
> at all, only that ?y is connected to :x.
>
>          Andy
>
>>
>> Thanks a lot
>>
>> Mena
>>
>> ----------------------------------------------------
>> Mena B. Habib
>> PhD Student


Re: Loading model into memory

Posted by Andy Seaborne <an...@apache.org>.
On 08/03/12 17:26, m.badiehhabibmorgan@utwente.nl wrote:
> It can be useful in this way. But how to run it using Jena library? Is there a function i can use to apply such query on a model?

You can make a SPARQL query or call the path evaluator PathEval directly.

You will need ARQ 2.9.1-incubating-SNAPSHOT from the Apache snapshot 
repository.

	Andy

>
> Thanks a lot
>
> Mena
> ________________________________________
> From: Andy Seaborne [andy.seaborne.apache@gmail.com] on behalf of Andy Seaborne [andy@apache.org]
> Sent: Thursday, March 08, 2012 5:26 PM
> To: jena-users@incubator.apache.org
> Subject: Re: Loading model into memory
>
> On 08/03/12 16:06, m.badiehhabibmorgan@utwente.nl wrote:
>> What I want to do is to see how two entities are related .. I am using the shortest path as a measure for relatedness. I don't care about what kind of relations they had. This why I can't specify certain property.
>> I wonder how { :x DISTINCT(path) ?y } works? What is the output expected for this query? Is it the same as getting the shortest path from x to y?
>
> It's not the shortest path - in fact, it does not say what the path is
> at all, only that ?y is connected to :x.
>
>          Andy
>
>>
>> Thanks a lot
>>
>> Mena
>>
>> ----------------------------------------------------
>> Mena B. Habib
>> PhD Student


RE: Loading model into memory

Posted by m....@utwente.nl.
It can be useful in this way. But how to run it using Jena library? Is there a function i can use to apply such query on a model?

Thanks a lot

Mena
________________________________________
From: Andy Seaborne [andy.seaborne.apache@gmail.com] on behalf of Andy Seaborne [andy@apache.org]
Sent: Thursday, March 08, 2012 5:26 PM
To: jena-users@incubator.apache.org
Subject: Re: Loading model into memory

On 08/03/12 16:06, m.badiehhabibmorgan@utwente.nl wrote:
> What I want to do is to see how two entities are related .. I am using the shortest path as a measure for relatedness. I don't care about what kind of relations they had. This why I can't specify certain property.
> I wonder how { :x DISTINCT(path) ?y } works? What is the output expected for this query? Is it the same as getting the shortest path from x to y?

It's not the shortest path - in fact, it does not say what the path is
at all, only that ?y is connected to :x.

        Andy

>
> Thanks a lot
>
> Mena
>
> ----------------------------------------------------
> Mena B. Habib
> PhD Student

Re: Loading model into memory

Posted by Andy Seaborne <an...@apache.org>.
On 08/03/12 16:06, m.badiehhabibmorgan@utwente.nl wrote:
> What I want to do is to see how two entities are related .. I am using the shortest path as a measure for relatedness. I don't care about what kind of relations they had. This why I can't specify certain property.
> I wonder how { :x DISTINCT(path) ?y } works? What is the output expected for this query? Is it the same as getting the shortest path from x to y?

It's not the shortest path - in fact, it does not say what the path is 
at all, only that ?y is connected to :x.

	Andy

>
> Thanks a lot
>
> Mena
>
> ----------------------------------------------------
> Mena B. Habib
> PhD Student

RE: Loading model into memory

Posted by m....@utwente.nl.
What I want to do is to see how two entities are related .. I am using the shortest path as a measure for relatedness. I don't care about what kind of relations they had. This why I can't specify certain property.
I wonder how { :x DISTINCT(path) ?y } works? What is the output expected for this query? Is it the same as getting the shortest path from x to y?

Thanks a lot

Mena

----------------------------------------------------
Mena B. Habib
PhD Student

University of Twente 
Faculty of Electrical Engineering, Mathematics and Computer Science.
Database Chair
7500AE Enschede, Netherlands

mail: m.b.habib@ewi.utwente.nl
website: http://wwwhome.ctit.utwente.nl/~badiehm/
Phone: +31 53 489 4549
Fax: +31 53 489 2927
Mobile: +31 68 183 2680

-----Original Message-----
From: Andy Seaborne [mailto:andy.seaborne.apache@gmail.com] On Behalf Of Andy Seaborne
Sent: Thursday, March 08, 2012 4:51 PM
To: jena-users@incubator.apache.org
Subject: Re: Loading model into memory

On 08/03/12 15:03, Chris Dollin wrote:
> Mena said:
>
>> I want to apply the OnTools .FindShortestPath function on Yago.
>> I am using the following code to load the model:
>>
>> Model model = TDBFactory.createModel(FullYagoDirectory);
>>
>> The FindShortestPath function taking too much time to return a result.
>> I wonder if it is possible to load the model into main memory to make 
>> it faster or if there is any other way to make FindShortestPath much faster.
>
>      Model model = ModelFactory.createDefaultModel().add( 
> TDBFactory.createModel(FullYagoDirectory) );
>
> Of course you may then run out of memory if the model is big.
>
> Chris
>
> ("Default" models are in-memory models.)

IIRC YAGO(2) is a bit big.  The core is something like 30 million triples and full 80 million triples, I think.

Bit big for memory unless you have a big server.

Do you need "shortest path" or is just connectivity of entities acceptable?

ARQ now has DISTINCT for paths and executes it (more) efficiently:

{ :x DISTINCT(path) ?y }

in the ARQ language.

(more to come here ... "soon")


If you do want "shortest path", you may need to simplify the problem.

Jena's OntTools shortest path is quite general - can you work with, say, the path being a fixed property?

If so, maybe extract all the occurrences of that property and make a subgraph, hopefully smaller.

You may need to look at a graph algorithm like the Floyd-Warshall algorithm  [*] which is space-consuming and O(N^3) in time.  Being able to reduce to something smaller helps with the space consumption.

(OntTool.findShortestPath is a simple breadth first search).

	Andy


[*] http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm

	Andy




Re: Loading model into memory

Posted by Andy Seaborne <an...@apache.org>.
On 08/03/12 15:03, Chris Dollin wrote:
> Mena said:
>
>> I want to apply the OnTools .FindShortestPath function on Yago.
>> I am using the following code to load the model:
>>
>> Model model = TDBFactory.createModel(FullYagoDirectory);
>>
>> The FindShortestPath function taking too much time to return a result.
>> I wonder if it is possible to load the model into main memory to make it
>> faster or if there is any other way to make FindShortestPath much faster.
>
>      Model model = ModelFactory.createDefaultModel().add( TDBFactory.createModel(FullYagoDirectory) );
>
> Of course you may then run out of memory if the model is big.
>
> Chris
>
> ("Default" models are in-memory models.)

IIRC YAGO(2) is a bit big.  The core is something like 30 million 
triples and full 80 million triples, I think.

Bit big for memory unless you have a big server.

Do you need "shortest path" or is just connectivity of entities acceptable?

ARQ now has DISTINCT for paths and executes it (more) efficiently:

{ :x DISTINCT(path) ?y }

in the ARQ language.

(more to come here ... "soon")


If you do want "shortest path", you may need to simplify the problem.

Jena's OntTools shortest path is quite general - can you work with, say, 
the path being a fixed property?

If so, maybe extract all the occurrences of that property and make a 
subgraph, hopefully smaller.

You may need to look at a graph algorithm like the Floyd-Warshall 
algorithm  [*] which is space-consuming and O(N^3) in time.  Being able 
to reduce to something smaller helps with the space consumption.

(OntTool.findShortestPath is a simple breadth first search).

	Andy


[*] http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm

	Andy




Re: Loading model into memory

Posted by Chris Dollin <ch...@epimorphics.com>.
Mena said:

> I want to apply the OnTools .FindShortestPath function on Yago.
> I am using the following code to load the model:
> 
> Model model = TDBFactory.createModel(FullYagoDirectory);
> 
> The FindShortestPath function taking too much time to return a result.
> I wonder if it is possible to load the model into main memory to make it
> faster or if there is any other way to make FindShortestPath much faster.

    Model model = ModelFactory.createDefaultModel().add( TDBFactory.createModel(FullYagoDirectory) );

Of course you may then run out of memory if the model is big.

Chris

("Default" models are in-memory models.)

-- 
"I don't want to know what the Structuralists think! I want     /Archer's Goon/
 to know what YOU think!"

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT
Epimorphics Ltd. is a limited company registered in England (number 7016688)