You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Oystein Grovlen - Sun Norway <Oy...@Sun.COM> on 2007/04/13 14:28:42 UTC

Lifetime of Blob/Clob objects, compatibility issues

As part of the work on using locators for Blob/Clob, we will need to
be able to determine when to garbage collect information about locators.
The JDBC spec says the following about lifetime of Blobs [JDBC 4 spec,
section 16.3.1]:

     For locator based implementations, Blob, Clob and NClob objects
     remain valid for at least the duration of the transaction in which
     they are created, unless their free method is invoked or they are
     garbage collected.

     For implementations that fully materialize the Large Object (LOB),
     the Blob, Clob and NClob objects will remain valid until such time
     as the free method is called or the LOB is garbage collected.

     Portable applications should not depend upon the LOB persisting
     past the end of the transaction.

Note that the current Derby behavior diverges from this spec in
several aspects, and I would say the current behavior is quite a mess.
Some observations I have made:

    - In embedded, objects for short LOB values are valid after
      transaction commit, objects for long values are not.  The same
      holds after the connection is closed.
    - On the client, objects are generally valid after commit.
    - On the client, length() may be called on Clob objects after the
      connection is closed.  However, other methods are not allowed.
    - On the client, no operations can be performed on a Blob after
      connection is closed.

In my opinion, we should strive to get the same behavior regardless
of framework or LOB size.  The plan is to change the lifetime of LOB
objects to correspond to the lifetime of the transaction in which it
was created.  A life beyond the close of the connection, is not
possible on the client side, and an implementation where LOB objects
are valid after commit will be pretty complex, both generally on the
client side and for large LOBs in embedded.

Changing the lifetime for Blob/Clob objects, may cause compatibility
issues for existing applications.  Client applications that currently
access LOB objects after commit, and embedded applications that access
small LOB objects after commit, will have to be changed.  Do people
think this is acceptable?

-- 
Øystein

Re: Lifetime of Blob/Clob objects, compatibility issues

Posted by Oystein Grovlen - Sun Norway <Oy...@Sun.COM>.
Daniel John Debrunner wrote:
 > Oystein Grovlen - Sun Norway wrote:
...
 >> Some observations I have made:
 >>
 >>    - In embedded, objects for short LOB values are valid after
 >>      transaction commit, objects for long values are not.  The same
 >>      holds after the connection is closed.
 > seems to be within spec (lob is fully materialized)

The problem I have with this behavior is that the lob is not always
fully materialized.  Hence, based on the size of the LOB, the behavior
is different. My thought was that this is not in accordance with the
spec which says:

     An implementation of a Blob, Clob or NClob object may either be
     locator based or result in the object being fully materialized on
     the client.

However, I see that since it is talking about the "implementation of a
Blob [...] object", one may argue that is OK for different objects to
have different implementations.  (My first interpretation was that
this said that the driver must either be locator based or fully
materialize, and not both.)

Regardless of spec, I am not very comfortable with an implementation
where the behavior is not uniform and where it is very difficult for a
programmer to know which behavior to expect.  (E.g., I think the max.
size for a LOB to be fully materialized will depend on the page size
of the underlying table.)

 >
 >>    - On the client, objects are generally valid after commit.
 > seems to be within spec (lob is fully materialized?)

I agree, since lob is currently fully materialized, this is according
to the spec.

 >
 >>    - On the client, length() may be called on Clob objects after the
 >>      connection is closed.  However, other methods are not allowed.
 >>    - On the client, no operations can be performed on a Blob after
 >>      connection is closed.
 > seems inconsistent and against the spec if lobs are fully materialized
 > but within the spec if the lobs are locator based.

Right, in the case of fully materialized lob objects, the spec does
not distinguish between after commit and after close of connection.

 >> In my opinion, we should strive to get the same behavior regardless
 >> of framework or LOB size.  The plan is to change the lifetime of LOB
 >> objects to correspond to the lifetime of the transaction in which it
 >> was created.  A life beyond the close of the connection, is not
 >> possible on the client side, and an implementation where LOB objects
 >> are valid after commit will be pretty complex, both generally on the
 >> client side and for large LOBs in embedded.
 >
 > For 10.3 are client LOB's locator based and embedded non-locator based?

Client LOBs will be locator based.  For embedded, I am not sure what
to call it.  If non-locator based means fully materialized, that will
not be the case, since values will only be materialized if they are
short or if they are altered.  (Being able to alter LOB values is new
functionality implemented by Anurag for 10.3).

I think it would be best if client and embedded had the same behavior,
where Blob/Clob objects are seen as proxies for the underlying LOB
column value.  These proxies may materialize (part of) a value, but
that is an implementation detail that should not concern programmers.

The metadata locators locatorsUpdateCopy() will return true both for
client and embedded since you will need to do updateRow() in order for
changes to LOB values to be reflected in the database.

Another aspect here is that one need to garbage-collect the
materialized values.  For short values that is straight forward since
it is all in memory and Java's garbage-collection will take care of
it.  When long values are altered, a temporary file will be created to
store the new version of the LOB.  Such files need to be deleted.  If
the LOB object is not going to be valid after commit, the deletion can
happen as part of auto-commit.  If not, the deletion will need to wait
until the LOB object is not referenced anymore.  That would make
things a bit more complicated.


 >
 > In 10.2 client LOB's are fully materialized, right?

Yes.

 >
 >> Changing the lifetime for Blob/Clob objects, may cause compatibility
 >> issues for existing applications.  Client applications that currently
 >> access LOB objects after commit, and embedded applications that access
 >> small LOB objects after commit, will have to be changed.  Do people
 >> think this is acceptable?
 >
 > I guess I'm unclear at this point, how much of the behaviour will be
 > changing due to the switch to locator LOB's and how much to due cleanup.
 > Is it possible to have a summary of behaviour compared to 10.2?

Client:

    10.2: A lob object is fully accessible until the connection is closed.
          (After connection is closed, some operations are still
          available).

    10.3: A lob object is not accessible after the transaction has been
          committed.

Embedded:

    10.2: A lob object for a short lob value is accessible as long it
          is referenced.
          A lob object for a long value is not accessible after the
          transaction has been committed.

    10.3: A lob object is not accessible after the transaction has
          been committed.

I looked in the Derby documentation to see what it said about this,
and the only thing I found was:

     Recommendations: Because the lifespan of a java.sql.Blob or
     java.sql.Clob ends when the transaction commits, turn off
     auto-commit with the java.sql.Blob or java.sql.Clob features.

In other words, we do not currently tell our users that it is possible
to access the Blob/Clob objects after transaction commit.

Note also the following note in the JDBC spec:

     Portable applications should not depend upon the LOB persisting
     past the end of the transaction.

Allowing access after transaction commit may make it more difficult
to port applications from Derby to other database systems.

-- 
Øystein

Re: Lifetime of Blob/Clob objects, compatibility issues

Posted by Daniel John Debrunner <dj...@apache.org>.
Oystein Grovlen - Sun Norway wrote:
> 
> As part of the work on using locators for Blob/Clob, we will need to
> be able to determine when to garbage collect information about locators.
> The JDBC spec says the following about lifetime of Blobs [JDBC 4 spec,
> section 16.3.1]:
> 
>     For locator based implementations, Blob, Clob and NClob objects
>     remain valid for at least the duration of the transaction in which
>     they are created, unless their free method is invoked or they are
>     garbage collected.
> 
>     For implementations that fully materialize the Large Object (LOB),
>     the Blob, Clob and NClob objects will remain valid until such time
>     as the free method is called or the LOB is garbage collected.
> 
>     Portable applications should not depend upon the LOB persisting
>     past the end of the transaction.
> 
> Note that the current Derby behavior diverges from this spec in
> several aspects,

How does Derby diverge from the spec (quoted above)?

> and I would say the current behavior is quite a mess.
> Some observations I have made:
> 
>    - In embedded, objects for short LOB values are valid after
>      transaction commit, objects for long values are not.  The same
>      holds after the connection is closed.
seems to be within spec (lob is fully materialized)

>    - On the client, objects are generally valid after commit.
seems to be within spec (lob is fully materialized?)

>    - On the client, length() may be called on Clob objects after the
>      connection is closed.  However, other methods are not allowed.
>    - On the client, no operations can be performed on a Blob after
>      connection is closed.
seems inconsistent and against the spec if lobs are fully materialized 
but within the spec if the lobs are locator based.

> In my opinion, we should strive to get the same behavior regardless
> of framework or LOB size.  The plan is to change the lifetime of LOB
> objects to correspond to the lifetime of the transaction in which it
> was created.  A life beyond the close of the connection, is not
> possible on the client side, and an implementation where LOB objects
> are valid after commit will be pretty complex, both generally on the
> client side and for large LOBs in embedded.

For 10.3 are client LOB's locator based and embedded non-locator based?

In 10.2 client LOB's are fully materialized, right?

> Changing the lifetime for Blob/Clob objects, may cause compatibility
> issues for existing applications.  Client applications that currently
> access LOB objects after commit, and embedded applications that access
> small LOB objects after commit, will have to be changed.  Do people
> think this is acceptable?

I guess I'm unclear at this point, how much of the behaviour will be 
changing due to the switch to locator LOB's and how much to due cleanup. 
Is it possible to have a summary of behaviour compared to 10.2?

Thanks,
Dan.