You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by "Shri :)" <sh...@gmail.com> on 2011/09/28 16:09:04 UTC

Performance of Jena SDB with MySQL as backend in windows platform

Hi Everyone,

I am currently doing my master thesis wherein I have to work with Jena SDB
using mySQL as a backend store. I have around 25 million triples to load
which has taken more than 5 days to load in windows platform, whereas
according to the Berlin Benchmark, it took only 4 hours to load the same
number of triples but in Linux platform, this has left me confused..is the
enormous difference because of the difference in the platform or should I do
any performance tuning/optimization to improve the load time??

kindly give your suggestions/comments

P.S I am using WAMP


Thanks

Shridevika

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by Andy Seaborne <an...@apache.org>.

On 28/09/11 15:39, Damian Steer wrote:
>
> On 28 Sep 2011, at 15:09, Shri :) wrote:
>
>> Hi Everyone,
>>
>> I am currently doing my master thesis wherein I have to work with Jena SDB
>> using mySQL as a backend store.
>
> You have to work with it?
>
>> I have around 25 million triples to load
>> which has taken more than 5 days to load in windows platform,
>
> I admire your persistence! How are you loading it? What type of machine is this running on?
>
> Sounds like it's seriously memory starved to me. [1] suggests increasing innodb_buffer_pool_size,
> which determines the db memory buffer. The default is very small.
>
> It's also faster to load the data and then build the indexes.
>
> Damian
>
> [1]<http://openjena.org/wiki/SDB/NotesMySQL>

Are you using the sdb bulk loader or loading via your own code?

What format is the data in?

(this Q is also on answers.semanticweb.com)

	Andy

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by Damian Steer <d....@bristol.ac.uk>.

On 28 Sep 2011, at 15:09, Shri :) wrote:

> Hi Everyone,
> 
> I am currently doing my master thesis wherein I have to work with Jena SDB
> using mySQL as a backend store.

You have to work with it?

> I have around 25 million triples to load
> which has taken more than 5 days to load in windows platform,

I admire your persistence! How are you loading it? What type of machine is this running on?

Sounds like it's seriously memory starved to me. [1] suggests increasing innodb_buffer_pool_size,
which determines the db memory buffer. The default is very small.

It's also faster to load the data and then build the indexes.

Damian

[1] <http://openjena.org/wiki/SDB/NotesMySQL>

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by Andy Seaborne <an...@apache.org>.

On 04/10/11 09:22, Shri :) wrote:
> Hi All,
>
> I did bulk loading through the command line utility (with time) after making
> some tuning with the Mysql(buffer_pool and key buffer size) and got the
> final loading time as roughly 5 and half hours for ~24 million triples which
> seems okay to me. It took 4 hours to index this dataset.
>
> Any comments here??

Windows ...

>
> I am now querying this dataset through command utility again where the
> resulting tuples are printed along with the execution time, I would like to
> know if this execution time includes the printing time as well (which I
> would *not* prefer), kindly let me know this..

Don't print the results.
   --results=none
or "count"
or a streaming format to a file.  "text" is not streaming.

See also --repeat=N

	Andy



>
> Thanks to all of you for you advices, it was very helpful to me :)
>
> BR,
> Shri
>
>
> On Fri, Sep 30, 2011 at 2:54 AM, Shri :)<sh...@gmail.com>  wrote:
>
>> Hello, Sorry my dataset is in .NT format..
>>
>>
>> On Fri, Sep 30, 2011 at 2:52 AM, Shri :)<sh...@gmail.com>  wrote:
>>
>>> Hi All,
>>>
>>>
>>> @Damian  thanks for the link, I will now try increasing the
>>> buffer_pool_size and carry out the loading..Will let you know how it goes.
>>>
>>> @ Andy: Are you using the sdb bulk loader or loading via your own code?What
>>> format is the data in?
>>> But why not use the sdbload tool? Take the source code and add whatever
>>> extras timing you need (it already can print some timing info).
>>>
>>>
>>> I am using the following code, which I don't think it is very different
>>> from the one that you suggested, *my data is in .TTL format*
>>> Here is the snippet of my code:
>>>
>>> StoreDesc storeDesc = StoreDesc.read("sdb2.ttl") ; IDBConnection conn =
>>> new DBConnection ( DB_URL, DB_USER, DB_PASSWD, DB ); conn.getConnection();
>>> SDBConnection sdbconn = SDBFactory.createConnection( conn.getConnection()) ;
>>> Store store = SDBFactory.connectStore(sdbconn, storeDesc) ; Model model=
>>> SDBFactory.connectDefaultModel(store); //read data into the database
>>> InputStream inn= new FileInputStream ("dataset_70000.nt"); long start =
>>> System.currentTimeMillis(); model.read(inn, "localhost", "TTL");
>>> loadtime=ext.elapsedTime(start); // Close the database connection
>>> store.close(); System.out.println("Loading time: " + loadtime);
>>>
>>>
>>>
>>> @Dave I think I followed the pattern suggested in the link that you gave
>>> me (http://openjena.org/wiki/SDB/Loading_data), the above is the snippet
>>> of my source code.
>>>   And one more thing, I didn't get the idea of "Are you wrapping the load
>>> in a transaction to avoid auto-commit costs?", can you please elaborate a
>>> bit on this?? Sorry, I am relatively a novice..
>>>
>>>
>>> Any thoughts over this? thank you very much! :)
>>>
>>> BR,
>>> shri
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Sep 29, 2011 at 12:00 AM, Shri :)<sh...@gmail.com>  wrote:
>>>
>>>>   *
>>>> *
>>>>
>>>> Hi Again,
>>>>
>>>> I supposed to evaluate the performance of few triple stores as a part of
>>>> my thesis work (which is the specification which I cannot change
>>>> unfortunately)one among them is Jens SDB with Mysql, I am using my own java
>>>> code to load the data and not the command line tool, as I wanted to make
>>>> note of the loading time. I am using .NT format of data for loading.
>>>>
>>>> I have a 8 GB RAM
>>>>
>>>> any thoughts/suggestion over this? thanks for your help.
>>>>
>>>>
>>>>
>>>> On Wed, Sep 28, 2011 at 4:09 PM, Shri :)<sh...@gmail.com>  wrote:
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> I am currently doing my master thesis wherein I have to work with Jena
>>>>> SDB using mySQL as a backend store. I have around 25 million triples to load
>>>>> which has taken more than 5 days to load in windows platform, whereas
>>>>> according to the Berlin Benchmark, it took only 4 hours to load the same
>>>>> number of triples but in Linux platform, this has left me confused..is the
>>>>> enormous difference because of the difference in the platform or should I do
>>>>> any performance tuning/optimization to improve the load time??
>>>>>
>>>>> kindly give your suggestions/comments
>>>>>
>>>>> P.S I am using WAMP
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Shridevika
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by Shridevika Maharajan <sh...@gmail.com>.

Hi again!

Yes my OS is windows 7

Shri

On Tue, Oct 4, 2011 at 1:22 AM, Shri :) <sh...@gmail.com> wrote:

> Hi All,
>
> I did bulk loading through the command line utility (with time) after
> making some tuning with the Mysql(buffer_pool and key buffer size) and got
> the final loading time as roughly 5 and half hours for ~24 million triples
> which seems okay to me. It took 4 hours to index this dataset.
>
> Any comments here??
>
> I am now querying this dataset through command utility again where the
> resulting tuples are printed along with the execution time, I would like to
> know if this execution time includes the printing time as well (which I
> would *not* prefer), kindly let me know this..
>
> Thanks to all of you for you advices, it was very helpful to me :)
>
> BR,
> Shri
>
>
> On Fri, Sep 30, 2011 at 2:54 AM, Shri :) <sh...@gmail.com> wrote:
>
>> Hello, Sorry my dataset is in .NT format..
>>
>>
>> On Fri, Sep 30, 2011 at 2:52 AM, Shri :) <sh...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>>
>>> @Damian  thanks for the link, I will now try increasing the
>>> buffer_pool_size and carry out the loading..Will let you know how it goes.
>>>
>>> @ Andy: Are you using the sdb bulk loader or loading via your own code?What
>>> format is the data in?
>>> But why not use the sdbload tool? Take the source code and add whatever
>>> extras timing you need (it already can print some timing info).
>>>
>>>
>>> I am using the following code, which I don't think it is very different
>>> from the one that you suggested, *my data is in .TTL format*
>>> Here is the snippet of my code:
>>>
>>> StoreDesc storeDesc = StoreDesc.read("sdb2.ttl") ; IDBConnection conn =
>>> new DBConnection ( DB_URL, DB_USER, DB_PASSWD, DB ); conn.getConnection();
>>> SDBConnection sdbconn = SDBFactory.createConnection( conn.getConnection()) ;
>>> Store store = SDBFactory.connectStore(sdbconn, storeDesc) ; Model model=
>>> SDBFactory.connectDefaultModel(store); //read data into the database
>>> InputStream inn= new FileInputStream ("dataset_70000.nt"); long start =
>>> System.currentTimeMillis(); model.read(inn, "localhost", "TTL");
>>> loadtime=ext.elapsedTime(start); // Close the database connection
>>> store.close(); System.out.println("Loading time: " + loadtime);
>>>
>>>
>>>
>>> @Dave I think I followed the pattern suggested in the link that you gave
>>> me (http://openjena.org/wiki/SDB/Loading_data), the above is the snippet
>>> of my source code.
>>>  And one more thing, I didn't get the idea of "Are you wrapping the load
>>> in a transaction to avoid auto-commit costs?", can you please elaborate
>>> a bit on this?? Sorry, I am relatively a novice..
>>>
>>>
>>> Any thoughts over this? thank you very much! :)
>>>
>>> BR,
>>> shri
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Sep 29, 2011 at 12:00 AM, Shri :) <sh...@gmail.com> wrote:
>>>
>>>>  *
>>>> *
>>>>
>>>> Hi Again,
>>>>
>>>> I supposed to evaluate the performance of few triple stores as a part of
>>>> my thesis work (which is the specification which I cannot change
>>>> unfortunately)one among them is Jens SDB with Mysql, I am using my own java
>>>> code to load the data and not the command line tool, as I wanted to make
>>>> note of the loading time. I am using .NT format of data for loading.
>>>>
>>>> I have a 8 GB RAM
>>>>
>>>> any thoughts/suggestion over this? thanks for your help.
>>>>
>>>>
>>>>
>>>> On Wed, Sep 28, 2011 at 4:09 PM, Shri :) <sh...@gmail.com> wrote:
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> I am currently doing my master thesis wherein I have to work with Jena
>>>>> SDB using mySQL as a backend store. I have around 25 million triples to load
>>>>> which has taken more than 5 days to load in windows platform, whereas
>>>>> according to the Berlin Benchmark, it took only 4 hours to load the same
>>>>> number of triples but in Linux platform, this has left me confused..is the
>>>>> enormous difference because of the difference in the platform or should I do
>>>>> any performance tuning/optimization to improve the load time??
>>>>>
>>>>> kindly give your suggestions/comments
>>>>>
>>>>> P.S I am using WAMP
>>>>>
>>>>>
>>>>> Thanks
>>>>>
>>>>> Shridevika
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by "Shri :)" <sh...@gmail.com>.

Hi All,

I did bulk loading through the command line utility (with time) after making
some tuning with the Mysql(buffer_pool and key buffer size) and got the
final loading time as roughly 5 and half hours for ~24 million triples which
seems okay to me. It took 4 hours to index this dataset.

Any comments here??

I am now querying this dataset through command utility again where the
resulting tuples are printed along with the execution time, I would like to
know if this execution time includes the printing time as well (which I
would *not* prefer), kindly let me know this..

Thanks to all of you for you advices, it was very helpful to me :)

BR,
Shri


On Fri, Sep 30, 2011 at 2:54 AM, Shri :) <sh...@gmail.com> wrote:

> Hello, Sorry my dataset is in .NT format..
>
>
> On Fri, Sep 30, 2011 at 2:52 AM, Shri :) <sh...@gmail.com> wrote:
>
>> Hi All,
>>
>>
>> @Damian  thanks for the link, I will now try increasing the
>> buffer_pool_size and carry out the loading..Will let you know how it goes.
>>
>> @ Andy: Are you using the sdb bulk loader or loading via your own code?What
>> format is the data in?
>> But why not use the sdbload tool? Take the source code and add whatever
>> extras timing you need (it already can print some timing info).
>>
>>
>> I am using the following code, which I don't think it is very different
>> from the one that you suggested, *my data is in .TTL format*
>> Here is the snippet of my code:
>>
>> StoreDesc storeDesc = StoreDesc.read("sdb2.ttl") ; IDBConnection conn =
>> new DBConnection ( DB_URL, DB_USER, DB_PASSWD, DB ); conn.getConnection();
>> SDBConnection sdbconn = SDBFactory.createConnection( conn.getConnection()) ;
>> Store store = SDBFactory.connectStore(sdbconn, storeDesc) ; Model model=
>> SDBFactory.connectDefaultModel(store); //read data into the database
>> InputStream inn= new FileInputStream ("dataset_70000.nt"); long start =
>> System.currentTimeMillis(); model.read(inn, "localhost", "TTL");
>> loadtime=ext.elapsedTime(start); // Close the database connection
>> store.close(); System.out.println("Loading time: " + loadtime);
>>
>>
>>
>> @Dave I think I followed the pattern suggested in the link that you gave
>> me (http://openjena.org/wiki/SDB/Loading_data), the above is the snippet
>> of my source code.
>>  And one more thing, I didn't get the idea of "Are you wrapping the load
>> in a transaction to avoid auto-commit costs?", can you please elaborate a
>> bit on this?? Sorry, I am relatively a novice..
>>
>>
>> Any thoughts over this? thank you very much! :)
>>
>> BR,
>> shri
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Sep 29, 2011 at 12:00 AM, Shri :) <sh...@gmail.com> wrote:
>>
>>>  *
>>> *
>>>
>>> Hi Again,
>>>
>>> I supposed to evaluate the performance of few triple stores as a part of
>>> my thesis work (which is the specification which I cannot change
>>> unfortunately)one among them is Jens SDB with Mysql, I am using my own java
>>> code to load the data and not the command line tool, as I wanted to make
>>> note of the loading time. I am using .NT format of data for loading.
>>>
>>> I have a 8 GB RAM
>>>
>>> any thoughts/suggestion over this? thanks for your help.
>>>
>>>
>>>
>>> On Wed, Sep 28, 2011 at 4:09 PM, Shri :) <sh...@gmail.com> wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> I am currently doing my master thesis wherein I have to work with Jena
>>>> SDB using mySQL as a backend store. I have around 25 million triples to load
>>>> which has taken more than 5 days to load in windows platform, whereas
>>>> according to the Berlin Benchmark, it took only 4 hours to load the same
>>>> number of triples but in Linux platform, this has left me confused..is the
>>>> enormous difference because of the difference in the platform or should I do
>>>> any performance tuning/optimization to improve the load time??
>>>>
>>>> kindly give your suggestions/comments
>>>>
>>>> P.S I am using WAMP
>>>>
>>>>
>>>> Thanks
>>>>
>>>> Shridevika
>>>>
>>>
>>>
>>
>

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by "Shri :)" <sh...@gmail.com>.

Hello, Sorry my dataset is in .NT format..

On Fri, Sep 30, 2011 at 2:52 AM, Shri :) <sh...@gmail.com> wrote:

> Hi All,
>
>
> @Damian  thanks for the link, I will now try increasing the
> buffer_pool_size and carry out the loading..Will let you know how it goes.
>
> @ Andy: Are you using the sdb bulk loader or loading via your own code?What
> format is the data in?
> But why not use the sdbload tool? Take the source code and add whatever
> extras timing you need (it already can print some timing info).
>
>
> I am using the following code, which I don't think it is very different
> from the one that you suggested, *my data is in .TTL format*
> Here is the snippet of my code:
>
> StoreDesc storeDesc = StoreDesc.read("sdb2.ttl") ; IDBConnection conn = new
> DBConnection ( DB_URL, DB_USER, DB_PASSWD, DB ); conn.getConnection();
> SDBConnection sdbconn = SDBFactory.createConnection( conn.getConnection()) ;
> Store store = SDBFactory.connectStore(sdbconn, storeDesc) ; Model model=
> SDBFactory.connectDefaultModel(store); //read data into the database
> InputStream inn= new FileInputStream ("dataset_70000.nt"); long start =
> System.currentTimeMillis(); model.read(inn, "localhost", "TTL");
> loadtime=ext.elapsedTime(start); // Close the database connection
> store.close(); System.out.println("Loading time: " + loadtime);
>
>
>
> @Dave I think I followed the pattern suggested in the link that you gave me
> (http://openjena.org/wiki/SDB/Loading_data), the above is the snippet of
> my source code.
>  And one more thing, I didn't get the idea of "Are you wrapping the load
> in a transaction to avoid auto-commit costs?", can you please elaborate a
> bit on this?? Sorry, I am relatively a novice..
>
>
> Any thoughts over this? thank you very much! :)
>
> BR,
> shri
>
>
>
>
>
>
>
>
> On Thu, Sep 29, 2011 at 12:00 AM, Shri :) <sh...@gmail.com> wrote:
>
>>  *
>> *
>>
>> Hi Again,
>>
>> I supposed to evaluate the performance of few triple stores as a part of
>> my thesis work (which is the specification which I cannot change
>> unfortunately)one among them is Jens SDB with Mysql, I am using my own java
>> code to load the data and not the command line tool, as I wanted to make
>> note of the loading time. I am using .NT format of data for loading.
>>
>> I have a 8 GB RAM
>>
>> any thoughts/suggestion over this? thanks for your help.
>>
>>
>>
>> On Wed, Sep 28, 2011 at 4:09 PM, Shri :) <sh...@gmail.com> wrote:
>>
>>> Hi Everyone,
>>>
>>> I am currently doing my master thesis wherein I have to work with Jena
>>> SDB using mySQL as a backend store. I have around 25 million triples to load
>>> which has taken more than 5 days to load in windows platform, whereas
>>> according to the Berlin Benchmark, it took only 4 hours to load the same
>>> number of triples but in Linux platform, this has left me confused..is the
>>> enormous difference because of the difference in the platform or should I do
>>> any performance tuning/optimization to improve the load time??
>>>
>>> kindly give your suggestions/comments
>>>
>>> P.S I am using WAMP
>>>
>>>
>>> Thanks
>>>
>>> Shridevika
>>>
>>
>>
>

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by Damian Steer <d....@bristol.ac.uk>.

`
On 30 Sep 2011, at 10:52, Shri :) wrote:

> Hi All,
> 
> 
> @Damian  thanks for the link, I will now try increasing the buffer_pool_size
> and carry out the loading..Will let you know how it goes.

Thanks Shri.

> @ Andy: Are you using the sdb bulk loader or loading via your own code?What
> format is the data in?

> I am using the following code, which I don't think it is very different from
> the one that you suggested, *my data is in .TTL format*
> Here is the snippet of my code:

>  //read data into the database
> InputStream inn= new FileInputStream ("dataset_70000.nt"); long start =
> System.currentTimeMillis(); model.read(inn, "localhost", "TTL");
> loadtime=ext.elapsedTime(start); // Close the database connection
> store.close(); System.out.println("Loading time: " + loadtime);

The formatting went wonky for me, but that looks fine to me.

Sanity check: try sdbload. If that's faster we have a bug somewhere.

> @Dave I think I followed the pattern suggested in the link that you gave me
> (http://openjena.org/wiki/SDB/Loading_data), the above is the snippet of my
> source code.
> And one more thing, I didn't get the idea of "Are you wrapping the load in a
> transaction to avoid auto-commit costs?", can you please elaborate a bit on
> this?? Sorry, I am relatively a novice..

Short answer: don't worry, you're not being bitten by this.

Long answer: if you were loading the data triple by triple, e.g.

   for (Statement s: statementsFromFile) {
       // Do something with s
       model.add(s);
   }

then that would be really slow. Each add is a distinct database action, and costly. Adding everything in one API call (as you have) should be fine.

Damian

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by Damian Steer <d....@bristol.ac.uk>.

(at a conference with shoddy networking -- I guess my reply this morning is list somewhere)

Sent from my iPhone

On 30 Sep 2011, at 15:10, Andy Seaborne <an...@apache.org> wrote:

> On 30/09/11 10:52, Shri :) wrote:
>> Hi All,
>> 
>> 
>> @Damian  thanks for the link, I will now try increasing the buffer_pool_size
>> and carry out the loading..Will let you know how it goes.
>> 
>> @ Andy: Are you using the sdb bulk loader or loading via your own code?What
>> format is the data in?
>> But why not use the sdbload tool? Take the source code and add whatever
>> extras timing you need (it already can print some timing info).
>> 
>> 
>> I am using the following code, which I don't think it is very different from
>> the one that you suggested, *my data is in .TTL format*
>> Here is the snippet of my code:
>> 
>> StoreDesc storeDesc = StoreDesc.read("sdb2.ttl") ; IDBConnection conn = new
>> DBConnection ( DB_URL, DB_USER, DB_PASSWD, DB ); conn.getConnection();
>> SDBConnection sdbconn = SDBFactory.createConnection( conn.getConnection()) ;
>> Store store = SDBFactory.connectStore(sdbconn, storeDesc) ; Model model=
>> SDBFactory.connectDefaultModel(store); //read data into the database
>> InputStream inn= new FileInputStream ("dataset_70000.nt"); long start =
>> System.currentTimeMillis(); model.read(inn, "localhost", "TTL");
>> loadtime=ext.elapsedTime(start); // Close the database connection
>> store.close(); System.out.println("Loading time: " + loadtime);
> 
> (Unreadable)
> 
> [
> Damian - does model.read() go via the bulkloader or is this code using one transaction per triple

Certainly should do. It would explain a lot if it didn't. I thought the readers signalled bulk loading. 

Will check.

Damian

> ]
> 
> Try putting around the load:
> store.getLoader().startBulkUpdate();
> ...
> store.getLoader().finishBulkUpdate();
> 
> 
> Using the Turtle reader for N-Triples is slightly slower - but only tens of %.
> 
>    Andy
> 
>> 
>> 
>> 
>> @Dave I think I followed the pattern suggested in the link that you gave me
>> (http://openjena.org/wiki/SDB/Loading_data), the above is the snippet of my
>> source code.
>> And one more thing, I didn't get the idea of "Are you wrapping the load in a
>> transaction to avoid auto-commit costs?", can you please elaborate a bit on
>> this?? Sorry, I am relatively a novice..
>> 
>> 
>> Any thoughts over this? thank you very much! :)
>> 
>> BR,
>> shri
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Thu, Sep 29, 2011 at 12:00 AM, Shri :)<sh...@gmail.com>  wrote:
>> 
>>> *
>>> *
>>> 
>>> Hi Again,
>>> 
>>> I supposed to evaluate the performance of few triple stores as a part of my
>>> thesis work (which is the specification which I cannot change
>>> unfortunately)one among them is Jens SDB with Mysql, I am using my own java
>>> code to load the data and not the command line tool, as I wanted to make
>>> note of the loading time. I am using .NT format of data for loading.
>>> 
>>> I have a 8 GB RAM
>>> 
>>> any thoughts/suggestion over this? thanks for your help.
>>> 
>>> 
>>> 
>>> On Wed, Sep 28, 2011 at 4:09 PM, Shri :)<sh...@gmail.com>  wrote:
>>> 
>>>> Hi Everyone,
>>>> 
>>>> I am currently doing my master thesis wherein I have to work with Jena SDB
>>>> using mySQL as a backend store. I have around 25 million triples to load
>>>> which has taken more than 5 days to load in windows platform, whereas
>>>> according to the Berlin Benchmark, it took only 4 hours to load the same
>>>> number of triples but in Linux platform, this has left me confused..is the
>>>> enormous difference because of the difference in the platform or should I do
>>>> any performance tuning/optimization to improve the load time??
>>>> 
>>>> kindly give your suggestions/comments
>>>> 
>>>> P.S I am using WAMP
>>>> 
>>>> 
>>>> Thanks
>>>> 
>>>> Shridevika
>>>> 
>>> 
>>> 
>> 
>

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by Andy Seaborne <an...@apache.org>.

On 30/09/11 10:52, Shri :) wrote:
> Hi All,
>
>
> @Damian  thanks for the link, I will now try increasing the buffer_pool_size
> and carry out the loading..Will let you know how it goes.
>
> @ Andy: Are you using the sdb bulk loader or loading via your own code?What
> format is the data in?
> But why not use the sdbload tool? Take the source code and add whatever
> extras timing you need (it already can print some timing info).
>
>
> I am using the following code, which I don't think it is very different from
> the one that you suggested, *my data is in .TTL format*
> Here is the snippet of my code:
>
> StoreDesc storeDesc = StoreDesc.read("sdb2.ttl") ; IDBConnection conn = new
> DBConnection ( DB_URL, DB_USER, DB_PASSWD, DB ); conn.getConnection();
> SDBConnection sdbconn = SDBFactory.createConnection( conn.getConnection()) ;
> Store store = SDBFactory.connectStore(sdbconn, storeDesc) ; Model model=
> SDBFactory.connectDefaultModel(store); //read data into the database
> InputStream inn= new FileInputStream ("dataset_70000.nt"); long start =
> System.currentTimeMillis(); model.read(inn, "localhost", "TTL");
> loadtime=ext.elapsedTime(start); // Close the database connection
> store.close(); System.out.println("Loading time: " + loadtime);

(Unreadable)

[
Damian - does model.read() go via the bulkloader or is this code using 
one transaction per triple
]

Try putting around the load:
store.getLoader().startBulkUpdate();
...
store.getLoader().finishBulkUpdate();


Using the Turtle reader for N-Triples is slightly slower - but only tens 
of %.

	Andy

>
>
>
> @Dave I think I followed the pattern suggested in the link that you gave me
> (http://openjena.org/wiki/SDB/Loading_data), the above is the snippet of my
> source code.
> And one more thing, I didn't get the idea of "Are you wrapping the load in a
> transaction to avoid auto-commit costs?", can you please elaborate a bit on
> this?? Sorry, I am relatively a novice..
>
>
> Any thoughts over this? thank you very much! :)
>
> BR,
> shri
>
>
>
>
>
>
>
>
> On Thu, Sep 29, 2011 at 12:00 AM, Shri :)<sh...@gmail.com>  wrote:
>
>> *
>> *
>>
>> Hi Again,
>>
>> I supposed to evaluate the performance of few triple stores as a part of my
>> thesis work (which is the specification which I cannot change
>> unfortunately)one among them is Jens SDB with Mysql, I am using my own java
>> code to load the data and not the command line tool, as I wanted to make
>> note of the loading time. I am using .NT format of data for loading.
>>
>> I have a 8 GB RAM
>>
>> any thoughts/suggestion over this? thanks for your help.
>>
>>
>>
>> On Wed, Sep 28, 2011 at 4:09 PM, Shri :)<sh...@gmail.com>  wrote:
>>
>>> Hi Everyone,
>>>
>>> I am currently doing my master thesis wherein I have to work with Jena SDB
>>> using mySQL as a backend store. I have around 25 million triples to load
>>> which has taken more than 5 days to load in windows platform, whereas
>>> according to the Berlin Benchmark, it took only 4 hours to load the same
>>> number of triples but in Linux platform, this has left me confused..is the
>>> enormous difference because of the difference in the platform or should I do
>>> any performance tuning/optimization to improve the load time??
>>>
>>> kindly give your suggestions/comments
>>>
>>> P.S I am using WAMP
>>>
>>>
>>> Thanks
>>>
>>> Shridevika
>>>
>>
>>
>

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by Dave Reynolds <da...@gmail.com>.

On Fri, 2011-09-30 at 02:52 -0700, Shri :) wrote:

> @Dave I think I followed the pattern suggested in the link that you gave me
> (http://openjena.org/wiki/SDB/Loading_data), the above is the snippet of my
> source code.
> And one more thing, I didn't get the idea of "Are you wrapping the load in a
> transaction to avoid auto-commit costs?", can you please elaborate a bit on
> this?? Sorry, I am relatively a novice..

You want to make sure the database sees all your inserts as one large
transaction. You have explicit control over this via
model.begin()/model.commit().

However, SDB will do this for you automatically so the main thing is to
make sure you are using the bulkloader as Andy and Damian have already
said.

Dave

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by "Shri :)" <sh...@gmail.com>.

Hi All,


@Damian  thanks for the link, I will now try increasing the buffer_pool_size
and carry out the loading..Will let you know how it goes.

@ Andy: Are you using the sdb bulk loader or loading via your own code?What
format is the data in?
But why not use the sdbload tool? Take the source code and add whatever
extras timing you need (it already can print some timing info).


I am using the following code, which I don't think it is very different from
the one that you suggested, *my data is in .TTL format*
Here is the snippet of my code:

StoreDesc storeDesc = StoreDesc.read("sdb2.ttl") ; IDBConnection conn = new
DBConnection ( DB_URL, DB_USER, DB_PASSWD, DB ); conn.getConnection();
SDBConnection sdbconn = SDBFactory.createConnection( conn.getConnection()) ;
Store store = SDBFactory.connectStore(sdbconn, storeDesc) ; Model model=
SDBFactory.connectDefaultModel(store); //read data into the database
InputStream inn= new FileInputStream ("dataset_70000.nt"); long start =
System.currentTimeMillis(); model.read(inn, "localhost", "TTL");
loadtime=ext.elapsedTime(start); // Close the database connection
store.close(); System.out.println("Loading time: " + loadtime);



@Dave I think I followed the pattern suggested in the link that you gave me
(http://openjena.org/wiki/SDB/Loading_data), the above is the snippet of my
source code.
And one more thing, I didn't get the idea of "Are you wrapping the load in a
transaction to avoid auto-commit costs?", can you please elaborate a bit on
this?? Sorry, I am relatively a novice..


Any thoughts over this? thank you very much! :)

BR,
shri








On Thu, Sep 29, 2011 at 12:00 AM, Shri :) <sh...@gmail.com> wrote:

> *
> *
>
> Hi Again,
>
> I supposed to evaluate the performance of few triple stores as a part of my
> thesis work (which is the specification which I cannot change
> unfortunately)one among them is Jens SDB with Mysql, I am using my own java
> code to load the data and not the command line tool, as I wanted to make
> note of the loading time. I am using .NT format of data for loading.
>
> I have a 8 GB RAM
>
> any thoughts/suggestion over this? thanks for your help.
>
>
>
> On Wed, Sep 28, 2011 at 4:09 PM, Shri :) <sh...@gmail.com> wrote:
>
>> Hi Everyone,
>>
>> I am currently doing my master thesis wherein I have to work with Jena SDB
>> using mySQL as a backend store. I have around 25 million triples to load
>> which has taken more than 5 days to load in windows platform, whereas
>> according to the Berlin Benchmark, it took only 4 hours to load the same
>> number of triples but in Linux platform, this has left me confused..is the
>> enormous difference because of the difference in the platform or should I do
>> any performance tuning/optimization to improve the load time??
>>
>> kindly give your suggestions/comments
>>
>> P.S I am using WAMP
>>
>>
>> Thanks
>>
>> Shridevika
>>
>
>

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by Andy Seaborne <an...@apache.org>.

This is exactly a copy of the answers.semanticweb.com comment.

You had replies here with suggestions and questions.

You should also search answers.semanticweb.com for related questions. 
One even has a code fragment.

But why not use the sdbload tool?  Take the source code and add whatever 
extras timing you need (it already can print some timing info).

	Andy


On 29/09/11 08:00, Shri :) wrote:
> *
> *
>
> Hi Again,
>
> I supposed to evaluate the performance of few triple stores as a part of my
> thesis work (which is the specification which I cannot change
> unfortunately)one among them is Jens SDB with Mysql, I am using my own java
> code to load the data and not the command line tool, as I wanted to make
> note of the loading time. I am using .NT format of data for loading.
>
> I have a 8 GB RAM
>
> any thoughts/suggestion over this? thanks for your help.
>
>
>
> On Wed, Sep 28, 2011 at 4:09 PM, Shri :)<sh...@gmail.com>  wrote:
>
>> Hi Everyone,
>>
>> I am currently doing my master thesis wherein I have to work with Jena SDB
>> using mySQL as a backend store. I have around 25 million triples to load
>> which has taken more than 5 days to load in windows platform, whereas
>> according to the Berlin Benchmark, it took only 4 hours to load the same
>> number of triples but in Linux platform, this has left me confused..is the
>> enormous difference because of the difference in the platform or should I do
>> any performance tuning/optimization to improve the load time??
>>
>> kindly give your suggestions/comments
>>
>> P.S I am using WAMP
>>
>>
>> Thanks
>>
>> Shridevika
>>
>

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by Dave Reynolds <da...@gmail.com>.

On Thu, 2011-09-29 at 09:00 +0200, Shri :) wrote: 
> *
> *
> 
> Hi Again,
> 
> I supposed to evaluate the performance of few triple stores as a part of my
> thesis work (which is the specification which I cannot change
> unfortunately)one among them is Jens SDB with Mysql, I am using my own java
> code to load the data and not the command line tool,

Did you follow the patterns in [1]?
Are you wrapping the load in a transaction to avoid auto-commit costs?

> as I wanted to make
> note of the loading time.

Note that the command line tool has a --time option.

Dave

[1] http://openjena.org/wiki/SDB/Loading_data

Re: Performance of Jena SDB with MySQL as backend in windows platform

Posted by "Shri :)" <sh...@gmail.com>.

*
*

Hi Again,

I supposed to evaluate the performance of few triple stores as a part of my
thesis work (which is the specification which I cannot change
unfortunately)one among them is Jens SDB with Mysql, I am using my own java
code to load the data and not the command line tool, as I wanted to make
note of the loading time. I am using .NT format of data for loading.

I have a 8 GB RAM

any thoughts/suggestion over this? thanks for your help.

On Wed, Sep 28, 2011 at 4:09 PM, Shri :) <sh...@gmail.com> wrote:

> Hi Everyone,
>
> I am currently doing my master thesis wherein I have to work with Jena SDB
> using mySQL as a backend store. I have around 25 million triples to load
> which has taken more than 5 days to load in windows platform, whereas
> according to the Berlin Benchmark, it took only 4 hours to load the same
> number of triples but in Linux platform, this has left me confused..is the
> enormous difference because of the difference in the platform or should I do
> any performance tuning/optimization to improve the load time??
>
> kindly give your suggestions/comments
>
> P.S I am using WAMP
>
>
> Thanks
>
> Shridevika
>