You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Bernie Greenberg <bs...@basistech.com> on 2012/04/03 18:25:32 UTC

Re-use of datasets after committing transactions

While I still haven't got an answer from this list about whether it was
really true that one has to close (and not reuse) dataset objects after a
committed or aborted "write" transaction, I did get an answer from my code,
as it were, and a surprising one, at that.

I found that at app shutdown time, I had nothing (with respect to Jena) to
do, as I had already, in each thread which had created a dataset in
response to a request for some kind of service, closed that dataset. Since
the datasets were not drawn from an open "master" object representing the
open store, but from a static source, there is no "master" object to close
down.

If the threads all had their hands on persistent, open dataset objects,
which each (according to your documentation and my own experience) can only
be used in that one thread, I would have a difficult problem causing those
threads (which may be asleep in a web or other server) to wake up to close
the pointer (yes, there may be a "thread close time" hook or the like, but
as I have it, I don't need one).

This all seems consistent with what we have transacted here before and
consistent with my understanding of transaction semantics, and seems to
work; please let me know if you think I'm overlooking something.

Thanks
Bernie

Re: Re-use of datasets after committing transactions

Posted by Bernie Greenberg <bs...@basistech.com>.
Indeed (even though I no longer do it).

On Wed, Apr 4, 2012 at 11:07 AM, Paolo Castagna <
castagna.lists@googlemail.com> wrote:

> Hi Andy,
> thanks for the explanation, very clear and I think very useful.
>
> Andy Seaborne wrote:
> > With transactions, no clearup after .end() is needed. (and a writer
> > doing .commit()/.abort() don't require .end - it's better style to
> > always call .end() in a "finally{}2 though).
> >
> > When .commit() happens, the journal is written (append only), with a
> > commit record.  The changes are written to the main dataset at sometime
> > when it's quiet.  It may be when the .commit() happens, it may not -
> > does not matter, the bytes are on-disk and the change is permanent.
>
> One might still have the doubt if it is possible to call .begin(...)
> on a Dataset after a successful READ|WRITE transaction:
>
>  Location location = ...
>  Dataset dataset = TDBFactory.createDataset(location);
>
>  dataset.begin(...);
>  try {
>      ...
>  } finally {
>      dataset.end();
>  }
>
>  ...
>
>  dataset.begin(...);
>  try {
>      ...
>  } finally {
>      dataset.end();
>  }
>
> I do not see any problem with that, but I wanted to double check.
>
> Thanks,
> Paolo
>

Re: Re-use of datasets after committing transactions

Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Andy,
thanks for the explanation, very clear and I think very useful.

Andy Seaborne wrote:
> With transactions, no clearup after .end() is needed. (and a writer
> doing .commit()/.abort() don't require .end - it's better style to
> always call .end() in a "finally{}2 though).
> 
> When .commit() happens, the journal is written (append only), with a
> commit record.  The changes are written to the main dataset at sometime
> when it's quiet.  It may be when the .commit() happens, it may not -
> does not matter, the bytes are on-disk and the change is permanent.

One might still have the doubt if it is possible to call .begin(...)
on a Dataset after a successful READ|WRITE transaction:

  Location location = ...
  Dataset dataset = TDBFactory.createDataset(location);

  dataset.begin(...);
  try {
      ...
  } finally {
      dataset.end();
  }

  ...

  dataset.begin(...);
  try {
      ...
  } finally {
      dataset.end();
  }

I do not see any problem with that, but I wanted to double check.

Thanks,
Paolo

Re: Re-use of datasets after committing transactions

Posted by Bernie Greenberg <bs...@basistech.com>.
This is very, very excellent and answers all the questions I asked, could
have asked, or should have asked. Jena is as truly ACID as orange juice.
All the points in your response should be made in some form, prominently,
in the documentation.  You figure out where.

Many thanks!
Bernie

On Wed, Apr 4, 2012 at 10:48 AM, Andy Seaborne <an...@apache.org> wrote:

> On 03/04/12 17:25, Bernie Greenberg wrote:
>
>> While I still haven't got an answer from this list about whether it was
>> really true that one has to close (and not reuse) dataset objects after a
>> committed or aborted "write" transaction, I did get an answer from my
>> code,
>> as it were, and a surprising one, at that.
>>
>> I found that at app shutdown time, I had nothing (with respect to Jena) to
>> do, as I had already, in each thread which had created a dataset in
>> response to a request for some kind of service, closed that dataset. Since
>> the datasets were not drawn from an open "master" object representing the
>> open store, but from a static source, there is no "master" object to close
>> down.
>>
>> If the threads all had their hands on persistent, open dataset objects,
>> which each (according to your documentation and my own experience) can
>> only
>> be used in that one thread, I would have a difficult problem causing those
>> threads (which may be asleep in a web or other server) to wake up to close
>> the pointer (yes, there may be a "thread close time" hook or the like, but
>> as I have it, I don't need one).
>>
>> This all seems consistent with what we have transacted here before and
>> consistent with my understanding of transaction semantics, and seems to
>> work; please let me know if you think I'm overlooking something.
>>
>> Thanks
>> Bernie
>>
>>
> When a (Java in-memory object for a) Dataset is used transactionally, it
> must be used only transactionally.  I think you are only using transactions
> so no issues around here.  People using datasets "old world"
> non-transactionally get old-world semantics - they need to be sync'ed.
>
> There's no harm syncing a transactional dataset (it does not do anything).
>
> With transactions, no clearup after .end() is needed. (and a writer doing
> .commit()/.abort() don't require .end - it's better style to always call
> .end() in a "finally{}2 though).
>
> When .commit() happens, the journal is written (append only), with a
> commit record.  The changes are written to the main dataset at sometime
> when it's quiet.  It may be when the .commit() happens, it may not - does
> not matter, the bytes are on-disk and the change is permanent.
>
> Any transaction starting after the .commit sees the changes, either from
> the real storage or the unflushed transaction state.  The system handles
> that.
>
> If the app exits before the journal is fully written to the main dataset
> (strictly - "is known to have been written back"), then on next startup,
> the journal is flushed and the changes have become permanent in the main
> storage.
>
> If the system crashes during write-back, then the changes are still in the
> journal - it just writes them again on next recovery.  The key point is
> that the journal contains the new state of the data (as blocks) and not
> diffs.  If it were diffs, then it would have to read the old state to
> calculate the new state.  By recording new state only, it can simply keep
> trying to write until it succeeds regardless of power cycling and crashes.
>  The journal is a sequence of idempotent changes.
>
> TDB uses write-ahead logging.  There is nothing to do on abort except
> forget about it.  There are no undo actions, no write-behind logging.
>
> Update of the storage is:
>  Write log to storage
>  sync the storage
>  Truncate log to zero.
>
> It's the truncate that records the fact all transactions have been flushed
> back to the real dataset.
>
> Which means the app has no shutdown actions to do.  Any running
> transactions implicitly abort.
>
>        Andy
>
>
>
>
>
>
>

Re: Re-use of datasets after committing transactions

Posted by Bernie Greenberg <bs...@basistech.com>.
Again, the same should be in the documentation. Is end() needed between the
two in this case? I understand that it is better style to use end() in any
case.

Bernie

On Wed, Apr 4, 2012 at 4:35 PM, Andy Seaborne <an...@apache.org> wrote:

> On 04/04/12 16:06, Paolo Castagna wrote:
>
>> Hi Andy,
>> thanks for the explanation, very clear and I think very useful.
>>
>> Andy Seaborne wrote:
>>
>>> With transactions, no clearup after .end() is needed. (and a writer
>>> doing .commit()/.abort() don't require .end - it's better style to
>>> always call .end() in a "finally{}2 though).
>>>
>>> When .commit() happens, the journal is written (append only), with a
>>> commit record.  The changes are written to the main dataset at sometime
>>> when it's quiet.  It may be when the .commit() happens, it may not -
>>> does not matter, the bytes are on-disk and the change is permanent.
>>>
>>
>> One might still have the doubt if it is possible to call .begin(...)
>> on a Dataset after a successful READ|WRITE transaction:
>>
>>   Location location = ...
>>   Dataset dataset = TDBFactory.createDataset(**location);
>>
>>   dataset.begin(...);
>>   try {
>>       ...
>>   } finally {
>>       dataset.end();
>>   }
>>
>>   ...
>>
>>   dataset.begin(...);
>>   try {
>>       ...
>>   } finally {
>>       dataset.end();
>>   }
>>
>> I do not see any problem with that, but I wanted to double check.
>>
>
> Yes, the app can do that.
>
>        Andy
>

Re: Re-use of datasets after committing transactions

Posted by Andy Seaborne <an...@apache.org>.
On 04/04/12 16:06, Paolo Castagna wrote:
> Hi Andy,
> thanks for the explanation, very clear and I think very useful.
>
> Andy Seaborne wrote:
>> With transactions, no clearup after .end() is needed. (and a writer
>> doing .commit()/.abort() don't require .end - it's better style to
>> always call .end() in a "finally{}2 though).
>>
>> When .commit() happens, the journal is written (append only), with a
>> commit record.  The changes are written to the main dataset at sometime
>> when it's quiet.  It may be when the .commit() happens, it may not -
>> does not matter, the bytes are on-disk and the change is permanent.
>
> One might still have the doubt if it is possible to call .begin(...)
> on a Dataset after a successful READ|WRITE transaction:
>
>    Location location = ...
>    Dataset dataset = TDBFactory.createDataset(location);
>
>    dataset.begin(...);
>    try {
>        ...
>    } finally {
>        dataset.end();
>    }
>
>    ...
>
>    dataset.begin(...);
>    try {
>        ...
>    } finally {
>        dataset.end();
>    }
>
> I do not see any problem with that, but I wanted to double check.

Yes, the app can do that.

	Andy

Re: Re-use of datasets after committing transactions

Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Andy,
thanks for the explanation, very clear and I think very useful.

Andy Seaborne wrote:
> With transactions, no clearup after .end() is needed. (and a writer
> doing .commit()/.abort() don't require .end - it's better style to
> always call .end() in a "finally{}2 though).
> 
> When .commit() happens, the journal is written (append only), with a
> commit record.  The changes are written to the main dataset at sometime
> when it's quiet.  It may be when the .commit() happens, it may not -
> does not matter, the bytes are on-disk and the change is permanent.

One might still have the doubt if it is possible to call .begin(...)
on a Dataset after a successful READ|WRITE transaction:

  Location location = ...
  Dataset dataset = TDBFactory.createDataset(location);

  dataset.begin(...);
  try {
      ...
  } finally {
      dataset.end();
  }

  ...

  dataset.begin(...);
  try {
      ...
  } finally {
      dataset.end();
  }

I do not see any problem with that, but I wanted to double check.

Thanks,
Paolo

Re: Re-use of datasets after committing transactions

Posted by Andy Seaborne <an...@apache.org>.
On 03/04/12 17:25, Bernie Greenberg wrote:
> While I still haven't got an answer from this list about whether it was
> really true that one has to close (and not reuse) dataset objects after a
> committed or aborted "write" transaction, I did get an answer from my code,
> as it were, and a surprising one, at that.
>
> I found that at app shutdown time, I had nothing (with respect to Jena) to
> do, as I had already, in each thread which had created a dataset in
> response to a request for some kind of service, closed that dataset. Since
> the datasets were not drawn from an open "master" object representing the
> open store, but from a static source, there is no "master" object to close
> down.
>
> If the threads all had their hands on persistent, open dataset objects,
> which each (according to your documentation and my own experience) can only
> be used in that one thread, I would have a difficult problem causing those
> threads (which may be asleep in a web or other server) to wake up to close
> the pointer (yes, there may be a "thread close time" hook or the like, but
> as I have it, I don't need one).
>
> This all seems consistent with what we have transacted here before and
> consistent with my understanding of transaction semantics, and seems to
> work; please let me know if you think I'm overlooking something.
>
> Thanks
> Bernie
>

When a (Java in-memory object for a) Dataset is used transactionally, it 
must be used only transactionally.  I think you are only using 
transactions so no issues around here.  People using datasets "old 
world" non-transactionally get old-world semantics - they need to be 
sync'ed.

There's no harm syncing a transactional dataset (it does not do anything).

With transactions, no clearup after .end() is needed. (and a writer 
doing .commit()/.abort() don't require .end - it's better style to 
always call .end() in a "finally{}2 though).

When .commit() happens, the journal is written (append only), with a 
commit record.  The changes are written to the main dataset at sometime 
when it's quiet.  It may be when the .commit() happens, it may not - 
does not matter, the bytes are on-disk and the change is permanent.

Any transaction starting after the .commit sees the changes, either from 
the real storage or the unflushed transaction state.  The system handles 
that.

If the app exits before the journal is fully written to the main dataset 
(strictly - "is known to have been written back"), then on next startup, 
the journal is flushed and the changes have become permanent in the main 
storage.

If the system crashes during write-back, then the changes are still in 
the journal - it just writes them again on next recovery.  The key point 
is that the journal contains the new state of the data (as blocks) and 
not diffs.  If it were diffs, then it would have to read the old state 
to calculate the new state.  By recording new state only, it can simply 
keep trying to write until it succeeds regardless of power cycling and 
crashes.  The journal is a sequence of idempotent changes.

TDB uses write-ahead logging.  There is nothing to do on abort except 
forget about it.  There are no undo actions, no write-behind logging.

Update of the storage is:
   Write log to storage
   sync the storage
   Truncate log to zero.

It's the truncate that records the fact all transactions have been 
flushed back to the real dataset.

Which means the app has no shutdown actions to do.  Any running 
transactions implicitly abort.

	Andy