You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Ewa Szwed <ew...@gmail.com> on 2014/05/13 15:13:23 UTC

Freebase data on Jena TDB

I have the following problem with my Jena TDB instance.
Last year in November I have loaded freebase dump to Jena TDB and I was
able to work with it reasonably good and got quite good performance for
most of my queries.
Recently I have updated my Jena TDB store with a dump from April.
Here are some numbers to show the difference between these 2 instances.



*November 2013*

*April 2014*

Full time of import

262,052 sec /3,03 days

716,121 sec / 8,29 days

Number of triples

1,826,551,456

2,489,221,915

Index size (whole dir)

174 GB

333 GB


My problem is that my new instance in not performing at all.
The queries that previously run for a couple of minutes take a couple of
hours now and it is not acceptable for my business. :(
So I would like to ask if there is a practical index limit size for Jena
TDB. Is there anything I can do to improve the performance of it.
Is this significant drop in performance sth expected or maybe I have sth
fundamentally wrong in my set up - which I would need to track and fix.
Please advise.
Regards,
Ewa Szwed

Re: Freebase data on Jena TDB

Posted by Ewa Szwed <ew...@gmail.com>.
Hello,
Thank you very much for your comment.
Indeed I have gathered all the facts and in November we did use tdbloader2
for our import.
In April I used tdbloader.
Could you please give me some more information on the updates.
If I use tdbupdate tool after I used tdbloader2, the benefit of smaller (in
theory faster) index is removed?
Can I do incremental updates some other way though without loosing it?
The requirement is we do updates to the store after we load.
Ewa

---------- Forwarded message ----------
From: bwm-epimorphics <br...@epimorphics.com>
Date: 2014-05-19 11:41 GMT+01:00
Subject: Re: Freebase data on Jena TDB
To: users@jena.apache.org



On 19/05/14 11:26, Ewa Szwed wrote:

> Hi Brian - I was using tdbloader for both November and April imports - I
> have tested it before and for freebase data set it works better than
> tdbloader2.
> tdbloader2 had faster data importing phase but much slower the indexing
> phase hence it makes the total import time longer than tdbloader for my
> case.
>
Yes. For some of mine too.

The reason I asked is that, as Andy mentioned, tdbloader2 tends to
generate a significantly more compact set of files and as a result
tdb can go a bit faster.  That advantage goes away if you then update
the database.  If you are loading a tdb image and then not updating it,
it might be worth the wait for tdbloader2.

Brian



>
> 2014-05-14 10:00 GMT+01:00 bwm-epimorphics <br...@epimorphics.com>:
>
>  How did you load the TDB store?  Is it possible you used tdbloader2 for
>> the first load and tdbloader for the second?
>>
>> Brian
>>
>>
>> On 13/05/14 14:13, Ewa Szwed wrote:
>>
>>  I have the following problem with my Jena TDB instance.
>>> Last year in November I have loaded freebase dump to Jena TDB and I was
>>> able to work with it reasonably good and got quite good performance for
>>> most of my queries.
>>> Recently I have updated my Jena TDB store with a dump from April.
>>> Here are some numbers to show the difference between these 2 instances.
>>>
>>>
>>>
>>> *November 2013*
>>>
>>> *April 2014*
>>>
>>>
>>> Full time of import
>>>
>>> 262,052 sec /3,03 days
>>>
>>> 716,121 sec / 8,29 days
>>>
>>> Number of triples
>>>
>>> 1,826,551,456
>>>
>>> 2,489,221,915
>>>
>>> Index size (whole dir)
>>>
>>> 174 GB
>>>
>>> 333 GB
>>>
>>>
>>> My problem is that my new instance in not performing at all.
>>> The queries that previously run for a couple of minutes take a couple of
>>> hours now and it is not acceptable for my business. :(
>>> So I would like to ask if there is a practical index limit size for Jena
>>> TDB. Is there anything I can do to improve the performance of it.
>>> Is this significant drop in performance sth expected or maybe I have sth
>>> fundamentally wrong in my set up - which I would need to track and fix.
>>> Please advise.
>>> Regards,
>>> Ewa Szwed
>>>
>>>
>>>  --
>> Epimorphics Ltd (http://www.epimorphics.com)
>>
>> Epimorphics Ltd. is a limited company registered in England (number
>> 7016688)
>> Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20
>> 6PT, UK
>>
>>
>>
-- 
Epimorphics Ltd (http://www.epimorphics.com)

Epimorphics Ltd. is a limited company registered in England (number 7016688)
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20
6PT, UK

Re: Freebase data on Jena TDB

Posted by bwm-epimorphics <br...@epimorphics.com>.
On 19/05/14 11:26, Ewa Szwed wrote:
> Hi Brian - I was using tdbloader for both November and April imports - I
> have tested it before and for freebase data set it works better than
> tdbloader2.
> tdbloader2 had faster data importing phase but much slower the indexing
> phase hence it makes the total import time longer than tdbloader for my
> case.
Yes. For some of mine too.

The reason I asked is that, as Andy mentioned, tdbloader2 tends to
generate a significantly more compact set of files and as a result
tdb can go a bit faster.  That advantage goes away if you then update
the database.  If you are loading a tdb image and then not updating it,
it might be worth the wait for tdbloader2.

Brian

>
>
> 2014-05-14 10:00 GMT+01:00 bwm-epimorphics <br...@epimorphics.com>:
>
>> How did you load the TDB store?  Is it possible you used tdbloader2 for
>> the first load and tdbloader for the second?
>>
>> Brian
>>
>>
>> On 13/05/14 14:13, Ewa Szwed wrote:
>>
>>> I have the following problem with my Jena TDB instance.
>>> Last year in November I have loaded freebase dump to Jena TDB and I was
>>> able to work with it reasonably good and got quite good performance for
>>> most of my queries.
>>> Recently I have updated my Jena TDB store with a dump from April.
>>> Here are some numbers to show the difference between these 2 instances.
>>>
>>>
>>>
>>> *November 2013*
>>>
>>> *April 2014*
>>>
>>>
>>> Full time of import
>>>
>>> 262,052 sec /3,03 days
>>>
>>> 716,121 sec / 8,29 days
>>>
>>> Number of triples
>>>
>>> 1,826,551,456
>>>
>>> 2,489,221,915
>>>
>>> Index size (whole dir)
>>>
>>> 174 GB
>>>
>>> 333 GB
>>>
>>>
>>> My problem is that my new instance in not performing at all.
>>> The queries that previously run for a couple of minutes take a couple of
>>> hours now and it is not acceptable for my business. :(
>>> So I would like to ask if there is a practical index limit size for Jena
>>> TDB. Is there anything I can do to improve the performance of it.
>>> Is this significant drop in performance sth expected or maybe I have sth
>>> fundamentally wrong in my set up - which I would need to track and fix.
>>> Please advise.
>>> Regards,
>>> Ewa Szwed
>>>
>>>
>> --
>> Epimorphics Ltd (http://www.epimorphics.com)
>>
>> Epimorphics Ltd. is a limited company registered in England (number
>> 7016688)
>> Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20
>> 6PT, UK
>>
>>

-- 
Epimorphics Ltd (http://www.epimorphics.com)

Epimorphics Ltd. is a limited company registered in England (number 7016688)
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT, UK


Re: Freebase data on Jena TDB

Posted by Ewa Szwed <ew...@gmail.com>.
Hi Brian - I was using tdbloader for both November and April imports - I
have tested it before and for freebase data set it works better than
tdbloader2.
tdbloader2 had faster data importing phase but much slower the indexing
phase hence it makes the total import time longer than tdbloader for my
case.


2014-05-14 10:00 GMT+01:00 bwm-epimorphics <br...@epimorphics.com>:

> How did you load the TDB store?  Is it possible you used tdbloader2 for
> the first load and tdbloader for the second?
>
> Brian
>
>
> On 13/05/14 14:13, Ewa Szwed wrote:
>
>> I have the following problem with my Jena TDB instance.
>> Last year in November I have loaded freebase dump to Jena TDB and I was
>> able to work with it reasonably good and got quite good performance for
>> most of my queries.
>> Recently I have updated my Jena TDB store with a dump from April.
>> Here are some numbers to show the difference between these 2 instances.
>>
>>
>>
>> *November 2013*
>>
>> *April 2014*
>>
>>
>> Full time of import
>>
>> 262,052 sec /3,03 days
>>
>> 716,121 sec / 8,29 days
>>
>> Number of triples
>>
>> 1,826,551,456
>>
>> 2,489,221,915
>>
>> Index size (whole dir)
>>
>> 174 GB
>>
>> 333 GB
>>
>>
>> My problem is that my new instance in not performing at all.
>> The queries that previously run for a couple of minutes take a couple of
>> hours now and it is not acceptable for my business. :(
>> So I would like to ask if there is a practical index limit size for Jena
>> TDB. Is there anything I can do to improve the performance of it.
>> Is this significant drop in performance sth expected or maybe I have sth
>> fundamentally wrong in my set up - which I would need to track and fix.
>> Please advise.
>> Regards,
>> Ewa Szwed
>>
>>
> --
> Epimorphics Ltd (http://www.epimorphics.com)
>
> Epimorphics Ltd. is a limited company registered in England (number
> 7016688)
> Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20
> 6PT, UK
>
>

Re: Freebase data on Jena TDB

Posted by bwm-epimorphics <br...@epimorphics.com>.
How did you load the TDB store?  Is it possible you used tdbloader2 for 
the first load and tdbloader for the second?

Brian

On 13/05/14 14:13, Ewa Szwed wrote:
> I have the following problem with my Jena TDB instance.
> Last year in November I have loaded freebase dump to Jena TDB and I was
> able to work with it reasonably good and got quite good performance for
> most of my queries.
> Recently I have updated my Jena TDB store with a dump from April.
> Here are some numbers to show the difference between these 2 instances.
>
>
>
> *November 2013*
>
> *April 2014*
>
> Full time of import
>
> 262,052 sec /3,03 days
>
> 716,121 sec / 8,29 days
>
> Number of triples
>
> 1,826,551,456
>
> 2,489,221,915
>
> Index size (whole dir)
>
> 174 GB
>
> 333 GB
>
>
> My problem is that my new instance in not performing at all.
> The queries that previously run for a couple of minutes take a couple of
> hours now and it is not acceptable for my business. :(
> So I would like to ask if there is a practical index limit size for Jena
> TDB. Is there anything I can do to improve the performance of it.
> Is this significant drop in performance sth expected or maybe I have sth
> fundamentally wrong in my set up - which I would need to track and fix.
> Please advise.
> Regards,
> Ewa Szwed
>

-- 
Epimorphics Ltd (http://www.epimorphics.com)

Epimorphics Ltd. is a limited company registered in England (number 7016688)
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT, UK