You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Martin Vassilev <mr...@gmail.com> on 2013/08/05 15:15:14 UTC

jena framework performance

Hi all,

I would like to ask you a few questions about Jena.

I read in the documentation that
*Note:* Although OWL version 1.1 is now a W3C recommendation, Jena's 
support for OWL 1.1 features is limited

1. Can you tell me what exactly is not supported (or what is supported) 
from the OWL 1.1 and what version of OWL is fully supported.

2. Can Jena handle 2 or 3 billions of triples using TDB.

3. Is it possible to switch TDB with SDB relatively easy ?

Martin

Re: TDBLoader vs TDBloader2

Posted by Andy Seaborne <an...@apache.org>.
On 09/08/13 01:02, Charles Li wrote:
> Hi, All:

tdbloader is script, as is tdbloader2.
TDBLoader is a class in TDB that implements tdbloader.
There is no equivalent clas for tdbloader2 - it simply does not work 
that way.

> 1. Is there any difference between the resultant data stores from the same RDF file using TDBLoader and TDBLoader2?

They contain the same triples and behave exactly the same.  A tdbloader2 
built is slightly more compact but that compactness will slowly 
disappear as data is added to the

Both loaders are optimizing the case of loading an empty store.

tdbloader will fall back and load into an existing store.
tdbloader2 won't - it'll destroy the old data.

> 2. What are some of the highlights of differences between TDBLoader and TDBLoader2?

tdbloader2 can be faster.  It only runs on Linux/unix machines, not MS 
Windows.

Indeed, if your sort(1) has --parallel, tdbloader2 can do some work in 
parallel.  See the script for details.

>
> 3. What are the guidelines on when to use TDBLoader and when to use TDBLoader2?

Use tdbloader unless your into the 100's of millions of triples.

Then try each and see what happens.

>
> Thanks a lot in advance!
> - Charles
>

	Andy


TDBLoader vs TDBloader2

Posted by Charles Li <ch...@gmail.com>.
Hi, All:

1. Is there any difference between the resultant data stores from the same RDF file using TDBLoader and TDBLoader2?

2. What are some of the highlights of differences between TDBLoader and TDBLoader2?

3. What are the guidelines on when to use TDBLoader and when to use TDBLoader2?

Thanks a lot in advance!
- Charles

Re: jena framework performance

Posted by Andy Seaborne <an...@apache.org>.
Hi Martin,

Jena provides APIs to all SPARQL functionality.  You can use any SPARQL 
compliant store.  4Store is one such store (nearly complete SPARQL 1.1 - 
no property paths currently IIRC).  It was built to handle data in the 
volume you're talking about.

	Andy


On 06/08/13 14:33, Martin Vassilev wrote:
> OK thnks, I see that Jena TDB/SDB won't fit for my needs. Is it possible
> to use Jena with 4store, i.e. take advantage of 4store performance and
> scalability and Jena's API ?
>
> Martin
>
> On 08/06/2013 01:18 PM, Andy Seaborne wrote:
>> On 05/08/13 23:01, Dave Reynolds wrote:
>>> On 05/08/13 14:15, Martin Vassilev wrote:
>>>> Hi all,
>>>>
>>>> I would like to ask you a few questions about Jena.
>>>>
>>>> I read in the documentation that
>>>> *Note:* Although OWL version 1.1 is now a W3C recommendation, Jena's
>>>> support for OWL 1.1 features is limited
>>>
>>> I assume you mean OWL 2, if I recall correctly that was called OWL 1.1
>>> for a while before the full extent of the changes were apparent.
>>>
>>>> 1. Can you tell me what exactly is not supported (or what is supported)
>>>> from the OWL 1.1 and what version of OWL is fully supported.
>>>
>>> Jena provides convenience methods for the complete OWL 1 language. It
>>> provides built in rule based inference for subsets of the OWL full
>>> dialect of OWL 1 as described in [1]. Third party tools like Pellet
>>> provide complete OWL DL implementations compatible with Jena.
>>>
>>> Jena currently has no direct support for any of the OWL 2 extensions
>>> beyond OWL 1. However, since OWL 2 can be fully encoded in RDF (the
>>> normative syntax) and Jena can handle any RDF, then you can read and
>>> write any OWL 2 ontology, you just need to do a little more work.
>>> Certainly people have build commercial OWL 2 tools on top of Jena.
>>>
>>> There is no built in inference support for any of OWL 2 beyond OWL 1. To
>>> find out exactly what Pellet et all provide for OWL 2 you would need to
>>> check with them directly.
>>>
>>>> 2. Can Jena handle 2 or 3 billions of triples using TDB.
>>>
>>> It depends.
>>>
>>> There is no hard limit on TDB size around there so with a big enough
>>> machine you should be able to load that. It is pushing the sizes where
>>> things can get slow though. So whether query performance will be
>>> adequate for you will depend on your data, your queries and your
>>> machine. The only way to be sure is to try it.
>>
>> It will take a long time to load and only be able to answer simple
>> queries (basically, look up a resource by URI or inverse functional
>> property and get the values of properties of that object).
>>
>>>
>>>> 3. Is it possible to switch TDB with SDB relatively easy ?
>>>
>>> Relatively but SDB is not advised for new projects unless you have a
>>> strong need for it and I very much doubt it would cope with 2-3bT.
>>
>> SDB will not work at 2-3 billion triples.
>>
>> If you design for SPARQL as your interface, you use a cluster store
>> like 4Store.  It's not full SPARQL 1.1, but it's close, is open source
>> (GPLv2) and runs on multiple machines for scaling.
>>
>>>
>>> Dave
>>>
>>> [1] http://jena.apache.org/documentation/inference/#owl
>>


Re: jena framework performance

Posted by Martin Vassilev <mr...@gmail.com>.
OK thnks, I see that Jena TDB/SDB won't fit for my needs. Is it possible 
to use Jena with 4store, i.e. take advantage of 4store performance and 
scalability and Jena's API ?

Martin

On 08/06/2013 01:18 PM, Andy Seaborne wrote:
> On 05/08/13 23:01, Dave Reynolds wrote:
>> On 05/08/13 14:15, Martin Vassilev wrote:
>>> Hi all,
>>>
>>> I would like to ask you a few questions about Jena.
>>>
>>> I read in the documentation that
>>> *Note:* Although OWL version 1.1 is now a W3C recommendation, Jena's
>>> support for OWL 1.1 features is limited
>>
>> I assume you mean OWL 2, if I recall correctly that was called OWL 1.1
>> for a while before the full extent of the changes were apparent.
>>
>>> 1. Can you tell me what exactly is not supported (or what is supported)
>>> from the OWL 1.1 and what version of OWL is fully supported.
>>
>> Jena provides convenience methods for the complete OWL 1 language. It
>> provides built in rule based inference for subsets of the OWL full
>> dialect of OWL 1 as described in [1]. Third party tools like Pellet
>> provide complete OWL DL implementations compatible with Jena.
>>
>> Jena currently has no direct support for any of the OWL 2 extensions
>> beyond OWL 1. However, since OWL 2 can be fully encoded in RDF (the
>> normative syntax) and Jena can handle any RDF, then you can read and
>> write any OWL 2 ontology, you just need to do a little more work.
>> Certainly people have build commercial OWL 2 tools on top of Jena.
>>
>> There is no built in inference support for any of OWL 2 beyond OWL 1. To
>> find out exactly what Pellet et all provide for OWL 2 you would need to
>> check with them directly.
>>
>>> 2. Can Jena handle 2 or 3 billions of triples using TDB.
>>
>> It depends.
>>
>> There is no hard limit on TDB size around there so with a big enough
>> machine you should be able to load that. It is pushing the sizes where
>> things can get slow though. So whether query performance will be
>> adequate for you will depend on your data, your queries and your
>> machine. The only way to be sure is to try it.
>
> It will take a long time to load and only be able to answer simple 
> queries (basically, look up a resource by URI or inverse functional 
> property and get the values of properties of that object).
>
>>
>>> 3. Is it possible to switch TDB with SDB relatively easy ?
>>
>> Relatively but SDB is not advised for new projects unless you have a
>> strong need for it and I very much doubt it would cope with 2-3bT.
>
> SDB will not work at 2-3 billion triples.
>
> If you design for SPARQL as your interface, you use a cluster store 
> like 4Store.  It's not full SPARQL 1.1, but it's close, is open source 
> (GPLv2) and runs on multiple machines for scaling.
>
>>
>> Dave
>>
>> [1] http://jena.apache.org/documentation/inference/#owl
>

Re: jena framework performance

Posted by Andy Seaborne <an...@apache.org>.
On 05/08/13 23:01, Dave Reynolds wrote:
> On 05/08/13 14:15, Martin Vassilev wrote:
>> Hi all,
>>
>> I would like to ask you a few questions about Jena.
>>
>> I read in the documentation that
>> *Note:* Although OWL version 1.1 is now a W3C recommendation, Jena's
>> support for OWL 1.1 features is limited
>
> I assume you mean OWL 2, if I recall correctly that was called OWL 1.1
> for a while before the full extent of the changes were apparent.
>
>> 1. Can you tell me what exactly is not supported (or what is supported)
>> from the OWL 1.1 and what version of OWL is fully supported.
>
> Jena provides convenience methods for the complete OWL 1 language. It
> provides built in rule based inference for subsets of the OWL full
> dialect of OWL 1 as described in [1]. Third party tools like Pellet
> provide complete OWL DL implementations compatible with Jena.
>
> Jena currently has no direct support for any of the OWL 2 extensions
> beyond OWL 1. However, since OWL 2 can be fully encoded in RDF (the
> normative syntax) and Jena can handle any RDF, then you can read and
> write any OWL 2 ontology, you just need to do a little more work.
> Certainly people have build commercial OWL 2 tools on top of Jena.
>
> There is no built in inference support for any of OWL 2 beyond OWL 1. To
> find out exactly what Pellet et all provide for OWL 2 you would need to
> check with them directly.
>
>> 2. Can Jena handle 2 or 3 billions of triples using TDB.
>
> It depends.
>
> There is no hard limit on TDB size around there so with a big enough
> machine you should be able to load that. It is pushing the sizes where
> things can get slow though. So whether query performance will be
> adequate for you will depend on your data, your queries and your
> machine. The only way to be sure is to try it.

It will take a long time to load and only be able to answer simple 
queries (basically, look up a resource by URI or inverse functional 
property and get the values of properties of that object).

>
>> 3. Is it possible to switch TDB with SDB relatively easy ?
>
> Relatively but SDB is not advised for new projects unless you have a
> strong need for it and I very much doubt it would cope with 2-3bT.

SDB will not work at 2-3 billion triples.

If you design for SPARQL as your interface, you use a cluster store like 
4Store.  It's not full SPARQL 1.1, but it's close, is open source 
(GPLv2) and runs on multiple machines for scaling.

>
> Dave
>
> [1] http://jena.apache.org/documentation/inference/#owl


Re: jena framework performance

Posted by Dave Reynolds <da...@gmail.com>.
On 05/08/13 14:15, Martin Vassilev wrote:
> Hi all,
>
> I would like to ask you a few questions about Jena.
>
> I read in the documentation that
> *Note:* Although OWL version 1.1 is now a W3C recommendation, Jena's
> support for OWL 1.1 features is limited

I assume you mean OWL 2, if I recall correctly that was called OWL 1.1 
for a while before the full extent of the changes were apparent.

> 1. Can you tell me what exactly is not supported (or what is supported)
> from the OWL 1.1 and what version of OWL is fully supported.

Jena provides convenience methods for the complete OWL 1 language. It 
provides built in rule based inference for subsets of the OWL full 
dialect of OWL 1 as described in [1]. Third party tools like Pellet 
provide complete OWL DL implementations compatible with Jena.

Jena currently has no direct support for any of the OWL 2 extensions 
beyond OWL 1. However, since OWL 2 can be fully encoded in RDF (the 
normative syntax) and Jena can handle any RDF, then you can read and 
write any OWL 2 ontology, you just need to do a little more work. 
Certainly people have build commercial OWL 2 tools on top of Jena.

There is no built in inference support for any of OWL 2 beyond OWL 1. To 
find out exactly what Pellet et all provide for OWL 2 you would need to 
check with them directly.

> 2. Can Jena handle 2 or 3 billions of triples using TDB.

It depends.

There is no hard limit on TDB size around there so with a big enough 
machine you should be able to load that. It is pushing the sizes where 
things can get slow though. So whether query performance will be 
adequate for you will depend on your data, your queries and your 
machine. The only way to be sure is to try it.

> 3. Is it possible to switch TDB with SDB relatively easy ?

Relatively but SDB is not advised for new projects unless you have a 
strong need for it and I very much doubt it would cope with 2-3bT.

Dave

[1] http://jena.apache.org/documentation/inference/#owl