You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Kristian Waagan (JIRA)" <ji...@apache.org> on 2010/01/19 15:28:56 UTC

[jira] Commented: (DERBY-3650) internal multiple references from different rows to a single BLOB/CLOB stream leads to various errors when second reference used.

    [ https://issues.apache.org/jira/browse/DERBY-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802275#action_12802275 ] 

Kristian Waagan commented on DERBY-3650:
----------------------------------------

I have been investigating this a bit further, and I'll try to share some of my findings.
My experiments consisted of the following high-level changes:
 a) add mechanism to clone store streams
 b) remove CloneableObject interface (and the method cloneObject)
 c) make DataValueDescriptor.getClone smarter (or more complex...)
 d) enable streaming support in the sorter
 e) add a special-case clone method for obtaining a materialized clone


a) add mechanism to clone store streams
Here I reused Mike's patch for DERBY-3650. I haven't found any problems with this yet, but haven't investigated very well.
I'm wondering if we can optimize the cloning a little bit by deferring the initial buffer fill? This is to avoid reading bytes we might never pass on to the user. I tried once and got a few test failures (probably because the exception was thrown in a different place, for instance that it is now thrown during the first InputStream.read instead of some other method).
See (d) for an additional problem associated with this change.

b) remove CloneableObject interface (and the method cloneObject)
This change in itself didn't cause any trouble when combined with (a) and (c). When combined with (d) an ASSERT was thrown, but I haven't yet investigated if it is a real problem or not.

c) make DataValueDescriptor.getClone smarter (or more complex...)
Here I made getClone return the most appropriate clone based on the state of the DVD:
 - simple types: normal clone (i.e. new SQLInteger(this.value) or new SQLClob(this.getString())
 - source is a cloneable stream: clone the stream
 - source is a non-cloneable stream: materialize and return normal clone

Again, I have to investigate more, but there seems to be a need for a "transfer the state of that DVD to a new DVD"-method. This is different from a clone in the sense that the original DVD will be reused for another row and have its value overwritten. In this case there is no need to actually clone the source stream, we can just reuse the stream reference. This is what cloneObject does.

d) enable streaming support in the sorter
When I did this, I found a bug where Derby enters an infinite loop while reading a stream: DERBY-4
Another problem that surfaced is that the sorter closes the source result set immediately, before the values are actually processed / sorted. This caused the cloned streams to fail when processing them, because the associated container handle got closed. I tried naively to not close the source rs, but this caused some problems when running suites.All (asserts, lock timeouts). Maybe the sorter can be changed to make sure the source rs is closed at another point, but this seems like a potentially dangerous approach.
Instead I added a new method, described in (e).

For clarity, Derby isn't currently able to efficiently execute something like "select ... order by length(clob_column)". There are user workarounds for this problem, so I'm not sure fixing it should have a high priority at this point. Also, the LOB values cannot be used in an order by. I don't know which types of operations you can do in an order by, and whether it is possible to perform these immediately instead of first reading the source rs into a temporary holder and then applying the function later.

e) add a special-case clone method for obtaining a materialized clone
Added to make (d) work in an easy way. suites.All passed when using this in a single place (the sorter), but there might be other usages for it as well.
By default the method will simply forward to getClone, but for SQLChar and SQLBinary it will materialize the stream if required.


With all the changes combined (prototype quality, I must recheck to make sure I didn't cheat too much), only lang.StreamsTest failed (on line 243) failed. The difference was that with the changes the value was materialized, whereas with clean trunk the stream was passed directly into store. The root cause is that I removed the "transfer value" functionality of cloneObject, and produced a real clone instead. The reason my smarter getClone method failed to produce a clone with a stream as source was that the source stream wasn't a store stream and thus the only way to safely clone it is to materialize it.

For Derby to function and keep its current performance, I see the need for the following functionality:
 1) value clones (capable of cloning source streams when possible)
 2) forcing materialization
 3) copying state from one DVD to a new DVD

I don't think all three can be combined into one method, because it is impossible for this method to know the context in which the "clone" will be used.
It is important to keep in mind that for many of the data types there is no difference between items 1,2, and 3.

Now, how does my changes differ from the original copyForRead method added by Mike?
DVD.copyForRead can simply return a reference to itself ('this'). Doing this is the cheapest way to copy a DVD, but it is also the way which puts the most restrictions on how it can be used. Since there will be multiple references to the same DVD, a single update or state change will affect all the code referring to that DVD.
This can be exploited for better performance in some cases, but I'm not sure if we should leave the decision to the calling code (using the public interface of DVD), or if we should either create a new method (like copyForRead) or add arguments to the getClone method.
Forcing materialization can also be done explicitly by the calling code, but it wouldn't look too nice:
if (dvd instanceof StreamStorable) {
    // Assuming calling this when there is no stream is working, otherwise one have to do another check is stream != null
    dvd.loadStream();
}
clone = dvd.getClone();

Opinions?

> internal multiple references from different rows to a single BLOB/CLOB stream leads to various errors when second reference used.
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3650
>                 URL: https://issues.apache.org/jira/browse/DERBY-3650
>             Project: Derby
>          Issue Type: Bug
>          Components: Network Client, SQL, Store
>    Affects Versions: 10.3.3.0, 10.4.1.3
>         Environment: Mac OSX 10.4
> JDK 1.5.0_13
> Hibernate EntityManager 3.2.1
>            Reporter: Golgoth 14
>         Attachments: cloning-methods.html, derby-3650-preliminary_2_diff.txt, derby-3650-preliminary_diff.txt, derby-3650_tests_diff.txt, Derby3650EmbeddedRepro.java, Derby3650FullClientRepro.java, Derby3650FullRepro.java, Derby3650Repro.java, DerbyHibernateTest.zip, testdb.zip, traces_on_FormatIdStream_alloc.txt, UnionAll.java
>
>
> Derby + Hibernate JPA 3.2.1 problem on entity with Blob/Clob
> Hi,
> I'm using Derby in Client - Server mode with Hibernate JPA EJB 3.0.
> When a query on an entity containing a Clob and some joins on other entites is executed, an exception with the following message is thrown:
>   XJ073: The data in this BLOB or CLOB is no longer available.  The BLOB/CLOB's transaction may be committed, or its connection is closed.
> This problem occurs when the property "hibernate.max_fetch_depth" is greater than 0.
> When hibernate.max_fetch_depth=0, the query works.
> If Derby is configured in embedded mode, the query works independently of the value of hibernate.max_fetch_depth.
> On the Hibernate's documentation, the advised value of hibernate.max_fetch_depth is 3.
> Could you explain me if I made something wrong ?
> Thank you.
> Stephane

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.