You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by "A. Anil SINACI" <a....@gmail.com> on 2013/01/23 15:50:06 UTC

When to call TDB.sync

Dear Jena users,

I need an advice on whether to use transactional TDB or 
non-transactional one, and if I use non-transactional; when to use TDB.sync.

As far as I understand from the tutorials, if I have never call a 
transaction operation on a dataset (i.e. dataset.begin(..)), then my TDB 
backed dataset is used in a non-transactional mode. With this kind of 
usage, I think I need to properly sync database in order not to put the 
data files into an inconsistent state. I hope I am correct up to this point.

I have a web application which uses Jena TDB as its data store. Multiple 
users are signin-in in and manipulating data. My current problem is 
that, if the JVM holding the servlet container shuts down (i.e. ctrl+c 
command), and if I run the server again, then I get a 
NullPointerException during the dataset creation: 
"TDBFactory.createDataset(...);". I tried to avoid this exception with 
TDB.sync after each write operation that I do, however this decreases 
the performance a lot.

I can afford some missing information in the persistent store, however 
the store becomes unreadable if closed (actually the JVM shuts down) 
without a TDB.sync. On the other hand switching to transactional TDB 
means that I need to employ a different architecture which will cost 
much to me. To illustrate I want to give a small example, my bean-like 
classes all extend OntClassImpl, hence in their constructors I call 
"super(node, graph);" which automatically adds the corresponding triples 
to my graph. It will be hard to edit this architecture to insert the 
begin and end parts of transactions.

Any kind of advice is appreciated :)

Anil Sinaci.

Re: When to call TDB.sync

Posted by "A. Anil SINACI" <a....@gmail.com>.
Hi Tayfun,

Thanks for the advice, I will employ a similar transactional architecture.

Best,
Anil.

On 24.01.2013 21:15, Tayfun Gökmen Halaç wrote:
> Hi Anıl,
>
> As an advice, in your application, you can insert the transaction
> operations at the beginning (dataset.begin()) and end (dataset.commit(),
> dataset.end()) of each request-response cycle, Also, dataset.rollback() to
> a generic exception handler. Thus, your beans can operate in transaction
> without noticing it.
>
> Best,
> Tayfun
>
>
> 2013/1/24 A. Anil SINACI <a....@gmail.com>
>
>> Hi Andy,
>>
>> Thanks for your reply. The wise thing would be restructuring through the
>> transactional mode of TDB as you also point out.
>>
>> Best,
>> Anil.
>>
>>
>> On 23.01.2013 23:25, Andy Seaborne wrote:
>>
>>> On 23/01/13 14:50, A. Anil SINACI wrote:
>>>
>>>> Dear Jena users,
>>>>
>>>> I need an advice on whether to use transactional TDB or
>>>> non-transactional one, and if I use non-transactional; when to use
>>>> TDB.sync.
>>>>
>>>> As far as I understand from the tutorials, if I have never call a
>>>> transaction operation on a dataset (i.e. dataset.begin(..)), then my TDB
>>>> backed dataset is used in a non-transactional mode. With this kind of
>>>> usage, I think I need to properly sync database in order not to put the
>>>> data files into an inconsistent state. I hope I am correct up to this
>>>> point.
>>>>
>>> Correct.
>>>
>>>
>>>> I have a web application which uses Jena TDB as its data store. Multiple
>>>> users are signin-in in and manipulating data. My current problem is
>>>> that, if the JVM holding the servlet container shuts down (i.e. ctrl+c
>>>> command), and if I run the server again, then I get a
>>>> NullPointerException during the dataset creation:
>>>> "TDBFactory.createDataset(...)**;". I tried to avoid this exception with
>>>> TDB.sync after each write operation that I do, however this decreases
>>>> the performance a lot.
>>>>
>>>> I can afford some missing information in the persistent store, however
>>>> the store becomes unreadable if closed (actually the JVM shuts down)
>>>> without a TDB.sync. On the other hand switching to transactional TDB
>>>> means that I need to employ a different architecture which will cost
>>>> much to me. To illustrate I want to give a small example, my bean-like
>>>> classes all extend OntClassImpl, hence in their constructors I call
>>>> "super(node, graph);" which automatically adds the corresponding triples
>>>> to my graph. It will be hard to edit this architecture to insert the
>>>> begin and end parts of transactions.
>>>>
>>> Lack of sync() does not miss some specific triples - it will loose some
>>> internal datastructure data so the database will be corrupt and unusable.
>>>   There isn't a trade off to
>>>
>>> You can use transactions - it should be faster than needing to sync at
>>> every point.
>>>
>>> Open a transaction at the beginning of the web app operation and commit
>>> it at the end.  Transactions are per-thread so concurrency on the same
>>> database will work.  If you do not use transactions, you need manage
>>> concurrency yourself (or get a corrupt database, if TDB does not notice
>>> with it internal checking).
>>>
>>> If you know it's a read operation use a READ transaction.  They are more
>>> efficient as well as being good practice.
>>>
>>> So, roughly
>>>
>>> web_operation()
>>> {
>>>     dataset.begin(WRITE or READ)
>>>     try {
>>>       yourCode(dataset) // no concurrency control needed.
>>>       commit()
>>>     } ....
>>>
>>>
>>> and yourcode does not have to be transaction aware.
>>>
>>>      Andy
>>>
>>>


Re: When to call TDB.sync

Posted by Tayfun Gökmen Halaç <ta...@gmail.com>.
Hi Anıl,

As an advice, in your application, you can insert the transaction
operations at the beginning (dataset.begin()) and end (dataset.commit(),
dataset.end()) of each request-response cycle, Also, dataset.rollback() to
a generic exception handler. Thus, your beans can operate in transaction
without noticing it.

Best,
Tayfun


2013/1/24 A. Anil SINACI <a....@gmail.com>

> Hi Andy,
>
> Thanks for your reply. The wise thing would be restructuring through the
> transactional mode of TDB as you also point out.
>
> Best,
> Anil.
>
>
> On 23.01.2013 23:25, Andy Seaborne wrote:
>
>> On 23/01/13 14:50, A. Anil SINACI wrote:
>>
>>> Dear Jena users,
>>>
>>> I need an advice on whether to use transactional TDB or
>>> non-transactional one, and if I use non-transactional; when to use
>>> TDB.sync.
>>>
>>> As far as I understand from the tutorials, if I have never call a
>>> transaction operation on a dataset (i.e. dataset.begin(..)), then my TDB
>>> backed dataset is used in a non-transactional mode. With this kind of
>>> usage, I think I need to properly sync database in order not to put the
>>> data files into an inconsistent state. I hope I am correct up to this
>>> point.
>>>
>>
>> Correct.
>>
>>
>>> I have a web application which uses Jena TDB as its data store. Multiple
>>> users are signin-in in and manipulating data. My current problem is
>>> that, if the JVM holding the servlet container shuts down (i.e. ctrl+c
>>> command), and if I run the server again, then I get a
>>> NullPointerException during the dataset creation:
>>> "TDBFactory.createDataset(...)**;". I tried to avoid this exception with
>>> TDB.sync after each write operation that I do, however this decreases
>>> the performance a lot.
>>>
>>> I can afford some missing information in the persistent store, however
>>> the store becomes unreadable if closed (actually the JVM shuts down)
>>> without a TDB.sync. On the other hand switching to transactional TDB
>>> means that I need to employ a different architecture which will cost
>>> much to me. To illustrate I want to give a small example, my bean-like
>>> classes all extend OntClassImpl, hence in their constructors I call
>>> "super(node, graph);" which automatically adds the corresponding triples
>>> to my graph. It will be hard to edit this architecture to insert the
>>> begin and end parts of transactions.
>>>
>>
>> Lack of sync() does not miss some specific triples - it will loose some
>> internal datastructure data so the database will be corrupt and unusable.
>>  There isn't a trade off to
>>
>> You can use transactions - it should be faster than needing to sync at
>> every point.
>>
>> Open a transaction at the beginning of the web app operation and commit
>> it at the end.  Transactions are per-thread so concurrency on the same
>> database will work.  If you do not use transactions, you need manage
>> concurrency yourself (or get a corrupt database, if TDB does not notice
>> with it internal checking).
>>
>> If you know it's a read operation use a READ transaction.  They are more
>> efficient as well as being good practice.
>>
>> So, roughly
>>
>> web_operation()
>> {
>>    dataset.begin(WRITE or READ)
>>    try {
>>      yourCode(dataset) // no concurrency control needed.
>>      commit()
>>    } ....
>>
>>
>> and yourcode does not have to be transaction aware.
>>
>>     Andy
>>
>>
>

Re: When to call TDB.sync

Posted by "A. Anil SINACI" <a....@gmail.com>.
Hi Andy,

Thanks for your reply. The wise thing would be restructuring through the 
transactional mode of TDB as you also point out.

Best,
Anil.

On 23.01.2013 23:25, Andy Seaborne wrote:
> On 23/01/13 14:50, A. Anil SINACI wrote:
>> Dear Jena users,
>>
>> I need an advice on whether to use transactional TDB or
>> non-transactional one, and if I use non-transactional; when to use
>> TDB.sync.
>>
>> As far as I understand from the tutorials, if I have never call a
>> transaction operation on a dataset (i.e. dataset.begin(..)), then my TDB
>> backed dataset is used in a non-transactional mode. With this kind of
>> usage, I think I need to properly sync database in order not to put the
>> data files into an inconsistent state. I hope I am correct up to this
>> point.
>
> Correct.
>
>>
>> I have a web application which uses Jena TDB as its data store. Multiple
>> users are signin-in in and manipulating data. My current problem is
>> that, if the JVM holding the servlet container shuts down (i.e. ctrl+c
>> command), and if I run the server again, then I get a
>> NullPointerException during the dataset creation:
>> "TDBFactory.createDataset(...);". I tried to avoid this exception with
>> TDB.sync after each write operation that I do, however this decreases
>> the performance a lot.
>>
>> I can afford some missing information in the persistent store, however
>> the store becomes unreadable if closed (actually the JVM shuts down)
>> without a TDB.sync. On the other hand switching to transactional TDB
>> means that I need to employ a different architecture which will cost
>> much to me. To illustrate I want to give a small example, my bean-like
>> classes all extend OntClassImpl, hence in their constructors I call
>> "super(node, graph);" which automatically adds the corresponding triples
>> to my graph. It will be hard to edit this architecture to insert the
>> begin and end parts of transactions.
>
> Lack of sync() does not miss some specific triples - it will loose 
> some internal datastructure data so the database will be corrupt and 
> unusable.  There isn't a trade off to
>
> You can use transactions - it should be faster than needing to sync at 
> every point.
>
> Open a transaction at the beginning of the web app operation and 
> commit it at the end.  Transactions are per-thread so concurrency on 
> the same database will work.  If you do not use transactions, you need 
> manage concurrency yourself (or get a corrupt database, if TDB does 
> not notice with it internal checking).
>
> If you know it's a read operation use a READ transaction.  They are 
> more efficient as well as being good practice.
>
> So, roughly
>
> web_operation()
> {
>    dataset.begin(WRITE or READ)
>    try {
>      yourCode(dataset) // no concurrency control needed.
>      commit()
>    } ....
>
>
> and yourcode does not have to be transaction aware.
>
>     Andy
>


Re: When to call TDB.sync

Posted by Andy Seaborne <an...@apache.org>.
On 23/01/13 14:50, A. Anil SINACI wrote:
> Dear Jena users,
>
> I need an advice on whether to use transactional TDB or
> non-transactional one, and if I use non-transactional; when to use
> TDB.sync.
>
> As far as I understand from the tutorials, if I have never call a
> transaction operation on a dataset (i.e. dataset.begin(..)), then my TDB
> backed dataset is used in a non-transactional mode. With this kind of
> usage, I think I need to properly sync database in order not to put the
> data files into an inconsistent state. I hope I am correct up to this
> point.

Correct.

>
> I have a web application which uses Jena TDB as its data store. Multiple
> users are signin-in in and manipulating data. My current problem is
> that, if the JVM holding the servlet container shuts down (i.e. ctrl+c
> command), and if I run the server again, then I get a
> NullPointerException during the dataset creation:
> "TDBFactory.createDataset(...);". I tried to avoid this exception with
> TDB.sync after each write operation that I do, however this decreases
> the performance a lot.
>
> I can afford some missing information in the persistent store, however
> the store becomes unreadable if closed (actually the JVM shuts down)
> without a TDB.sync. On the other hand switching to transactional TDB
> means that I need to employ a different architecture which will cost
> much to me. To illustrate I want to give a small example, my bean-like
> classes all extend OntClassImpl, hence in their constructors I call
> "super(node, graph);" which automatically adds the corresponding triples
> to my graph. It will be hard to edit this architecture to insert the
> begin and end parts of transactions.

Lack of sync() does not miss some specific triples - it will loose some 
internal datastructure data so the database will be corrupt and 
unusable.  There isn't a trade off to

You can use transactions - it should be faster than needing to sync at 
every point.

Open a transaction at the beginning of the web app operation and commit 
it at the end.  Transactions are per-thread so concurrency on the same 
database will work.  If you do not use transactions, you need manage 
concurrency yourself (or get a corrupt database, if TDB does not notice 
with it internal checking).

If you know it's a read operation use a READ transaction.  They are more 
efficient as well as being good practice.

So, roughly

web_operation()
{
    dataset.begin(WRITE or READ)
    try {
      yourCode(dataset) // no concurrency control needed.
      commit()
    } ....


and yourcode does not have to be transaction aware.

	Andy