You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-user@db.apache.org by Mikael Sundberg <Mi...@artificial-solutions.com> on 2008/11/25 12:56:34 UTC

how to get data out from Clob?

We are having some trouble with retrieving data from big Clob fields.
The data is about 40-50MB and it takes forever to retrieve it.  

Tried a lot of different methods, getString, getClob, getAsciiStream.
They all take forever (hours).

When checking whats running in derby we find

CALL SYSIBM.CLOBGETSUBSTRING(?, ?, ?)

A lot of times. 

How should I retrieve large CLob fields fastest?

 


Re: how to get data out from Clob?

Posted by Kristian Waagan <Kr...@Sun.COM>.
Mikael Sundberg wrote:
> Thanks for the info 
> Does the same problem exist for Blob?
>   

There was a similar problem, but the fix for it should be in 10.4.2.0.
In many cases Blob is faster than Clob, because it doesn't have to deal 
with the character encoding issue. Derby is using the modified UTF-8 
encoding, where one character can be represented by either one, two or 
three bytes. This complicates the positioning logic, and you also get 
the cost of decoding/encoding of course.

If you can easily encode/decode your character data yourself and you 
only need to simply transfer the data between the client and the server, 
you might see some improvement by using Blob instead of Clob.
Of course, if the data volume is too large to fully represent it 
in-memory on the client, it complicates things for you to go with Blob.


-- 
Kristian

> Will update my testservers and try it. 
>
> -----Original Message-----
> From: Kristian.Waagan@Sun.COM [mailto:Kristian.Waagan@Sun.COM] 
> Sent: den 25 november 2008 14:01
> To: Derby Discussion
> Subject: Re: how to get data out from Clob?
>
> Mikael Sundberg wrote:
>   
>> Hi
>> Im using derby 10.4.2 and the clientdriver.
>> I changed from 
>>                 BufferedReader r = new BufferedReader(new
>> InputStreamReader(res.getAsciiStream("data")));
>>                 for (String line = r.readLine(); line != null; line =
>> r.readLine())
>>                 {
>>                     w.println(line);
>>                 }
>>
>> To
>> 	String data = res.getString("data");
>>
>> In a recent update to our application since the getString seemed to be
>> twice as fast then. Not entirely sure what versions of derby I tested
>>     
> on
>   
>> then.
>> So bacisly I should change back now?
>>     
>
> On the client driver, it doesn't matter (with respect to the bug I'm 
> talking about). I'm afraid you're stuck with the performance problem 
> until you upgrade to a newer version.
>
>   
>> We don't want to use any unstable
>> version on production so guess I will have to solve it some other way.
>>     
>
>   
>> Il atleast try out the latest version on my testservers to see if that
>> realy is the problem.
>>   
>>     
>
> I would consider using the newest bits from the 10.4 branch. The changes
>
> after the latest release are bug fixes, not new features.
> You can download test binaries from here 
> http://dbtg.thresher.com/derby/bits/
> Note that these are bits used for the nightly testing and should *not* 
> be used in production without verification and testing.
>
> Note that the performance problem in this case is severe; the larger the
>
> Clob the worse the impact (we're talking hours instead of 
> seconds/minutes). I would strongly suggest you test with a newer, though
>
> unreleased, version.
>
>
>   


Re: how to get data out from Clob?

Posted by Kristian Waagan <Kr...@Sun.COM>.
Mikael Sundberg wrote:
> Thanks for the info 
> Does the same problem exist for Blob?
> Will update my testservers and try it.

Hi Mikael,

Did you get around to update your test servers?
If so, did you see a performance improvement you use cases?


Regards,
-- 
Kristian

> 
> -----Original Message-----
> From: Kristian.Waagan@Sun.COM [mailto:Kristian.Waagan@Sun.COM] 
> Sent: den 25 november 2008 14:01
> To: Derby Discussion
> Subject: Re: how to get data out from Clob?
> 
> Mikael Sundberg wrote:
>> Hi
>> Im using derby 10.4.2 and the clientdriver.
>> I changed from 
>>                 BufferedReader r = new BufferedReader(new
>> InputStreamReader(res.getAsciiStream("data")));
>>                 for (String line = r.readLine(); line != null; line =
>> r.readLine())
>>                 {
>>                     w.println(line);
>>                 }
>>
>> To
>> 	String data = res.getString("data");
>>
>> In a recent update to our application since the getString seemed to be
>> twice as fast then. Not entirely sure what versions of derby I tested
> on
>> then.
>> So bacisly I should change back now?
> 
> On the client driver, it doesn't matter (with respect to the bug I'm 
> talking about). I'm afraid you're stuck with the performance problem 
> until you upgrade to a newer version.
> 
>> We don't want to use any unstable
>> version on production so guess I will have to solve it some other way.
> 
>> Il atleast try out the latest version on my testservers to see if that
>> realy is the problem.
>>   
> 
> I would consider using the newest bits from the 10.4 branch. The changes
> 
> after the latest release are bug fixes, not new features.
> You can download test binaries from here 
> http://dbtg.thresher.com/derby/bits/
> Note that these are bits used for the nightly testing and should *not* 
> be used in production without verification and testing.
> 
> Note that the performance problem in this case is severe; the larger the
> 
> Clob the worse the impact (we're talking hours instead of 
> seconds/minutes). I would strongly suggest you test with a newer, though
> 
> unreleased, version.
> 
> 


RE: how to get data out from Clob?

Posted by Mikael Sundberg <Mi...@artificial-solutions.com>.
Thanks for the info 
Does the same problem exist for Blob?
Will update my testservers and try it. 

-----Original Message-----
From: Kristian.Waagan@Sun.COM [mailto:Kristian.Waagan@Sun.COM] 
Sent: den 25 november 2008 14:01
To: Derby Discussion
Subject: Re: how to get data out from Clob?

Mikael Sundberg wrote:
> Hi
> Im using derby 10.4.2 and the clientdriver.
> I changed from 
>                 BufferedReader r = new BufferedReader(new
> InputStreamReader(res.getAsciiStream("data")));
>                 for (String line = r.readLine(); line != null; line =
> r.readLine())
>                 {
>                     w.println(line);
>                 }
>
> To
> 	String data = res.getString("data");
>
> In a recent update to our application since the getString seemed to be
> twice as fast then. Not entirely sure what versions of derby I tested
on
> then.
> So bacisly I should change back now?

On the client driver, it doesn't matter (with respect to the bug I'm 
talking about). I'm afraid you're stuck with the performance problem 
until you upgrade to a newer version.

> We don't want to use any unstable
> version on production so guess I will have to solve it some other way.

> Il atleast try out the latest version on my testservers to see if that
> realy is the problem.
>   

I would consider using the newest bits from the 10.4 branch. The changes

after the latest release are bug fixes, not new features.
You can download test binaries from here 
http://dbtg.thresher.com/derby/bits/
Note that these are bits used for the nightly testing and should *not* 
be used in production without verification and testing.

Note that the performance problem in this case is severe; the larger the

Clob the worse the impact (we're talking hours instead of 
seconds/minutes). I would strongly suggest you test with a newer, though

unreleased, version.


-- 
Kristian

>  
> -----Original Message-----
> From: Kristian.Waagan@Sun.COM [mailto:Kristian.Waagan@Sun.COM] 
> Sent: den 25 november 2008 13:32
> To: Derby Discussion
> Subject: Re: how to get data out from Clob?
>
> Mikael Sundberg write:
>   
>> We are having some trouble with retrieving data from big Clob fields.

>> The data is about 40-50MB and it takes forever to retrieve it.  
>>
>> Tried a lot of different methods, getString, getClob, getAsciiStream.

>> They all take forever (hours).
>>
>> When checking whats running in derby we find
>>
>> CALL SYSIBM.CLOBGETSUBSTRING(?, ?, ?)
>>
>> A lot of times.
>>
>> How should I retrieve large CLob fields fastest?
>>
>>  
>>
>>     
>
>
> Hello Mikael,
>
> You are being hit by a serious performance bug (a lot more prominent 
> since Derby 10.3).
> I understand you are using the client driver, but which version of
Derby
>
> are you using?
>
> I believe the issue you are experiencing has been fixed in trunk and
in 
> the 10.4 branch, but a release hasn't been made after the fix was 
> committed. If you don't want to build Derby yourself, *test binaries* 
> are available if you want to test with a newer version. It should be 
> sufficient to update the server side only (i.e. derbynet.jar and 
> derby.jar). If you try it out, please remember to take a backup of
your 
> database! If you need to upgrade you database to trunk, you also need
to
>
> enable pre-release upgrades.
>
> Unfortunately, there is no way to properly work around the problem in 
> older releases. You can adjust the buffer/block size used to fetch
data 
> in the client application, but this will only help a little bit and
will
>
> probably not be sufficient.
> The optimal buffer size is dependent on the Clob content (due to the 
> UTF-8 encoding used). If your Clobs contain ASCII data, you should set

> your buffer size to 32672 (note that this is smaller than 32KB). If
you 
> are using the embedded driver, make sure you access the Clob using one

> of the streaming methods (getSubString has been fixed by now).
> There will be several changes regarding Clob performance in the next 
> feature release.
>
> We're also in the progress of adding performance regression tests for 
> LOBs, as this is a problem we should have detected a long time ago!
> If you are still seeing the problem with the newest version, I'm very 
> interested in getting more details from you.
>
>
> regards,
>   


Re: how to get data out from Clob?

Posted by Kristian Waagan <Kr...@Sun.COM>.
Mikael Sundberg wrote:
> Hi
> Im using derby 10.4.2 and the clientdriver.
> I changed from 
>                 BufferedReader r = new BufferedReader(new
> InputStreamReader(res.getAsciiStream("data")));
>                 for (String line = r.readLine(); line != null; line =
> r.readLine())
>                 {
>                     w.println(line);
>                 }
>
> To
> 	String data = res.getString("data");
>
> In a recent update to our application since the getString seemed to be
> twice as fast then. Not entirely sure what versions of derby I tested on
> then.
> So bacisly I should change back now?

On the client driver, it doesn't matter (with respect to the bug I'm 
talking about). I'm afraid you're stuck with the performance problem 
until you upgrade to a newer version.

> We don't want to use any unstable
> version on production so guess I will have to solve it some other way. 
> Il atleast try out the latest version on my testservers to see if that
> realy is the problem.
>   

I would consider using the newest bits from the 10.4 branch. The changes 
after the latest release are bug fixes, not new features.
You can download test binaries from here 
http://dbtg.thresher.com/derby/bits/
Note that these are bits used for the nightly testing and should *not* 
be used in production without verification and testing.

Note that the performance problem in this case is severe; the larger the 
Clob the worse the impact (we're talking hours instead of 
seconds/minutes). I would strongly suggest you test with a newer, though 
unreleased, version.


-- 
Kristian

>  
> -----Original Message-----
> From: Kristian.Waagan@Sun.COM [mailto:Kristian.Waagan@Sun.COM] 
> Sent: den 25 november 2008 13:32
> To: Derby Discussion
> Subject: Re: how to get data out from Clob?
>
> Mikael Sundberg write:
>   
>> We are having some trouble with retrieving data from big Clob fields. 
>> The data is about 40-50MB and it takes forever to retrieve it.  
>>
>> Tried a lot of different methods, getString, getClob, getAsciiStream. 
>> They all take forever (hours).
>>
>> When checking whats running in derby we find
>>
>> CALL SYSIBM.CLOBGETSUBSTRING(?, ?, ?)
>>
>> A lot of times.
>>
>> How should I retrieve large CLob fields fastest?
>>
>>  
>>
>>     
>
>
> Hello Mikael,
>
> You are being hit by a serious performance bug (a lot more prominent 
> since Derby 10.3).
> I understand you are using the client driver, but which version of Derby
>
> are you using?
>
> I believe the issue you are experiencing has been fixed in trunk and in 
> the 10.4 branch, but a release hasn't been made after the fix was 
> committed. If you don't want to build Derby yourself, *test binaries* 
> are available if you want to test with a newer version. It should be 
> sufficient to update the server side only (i.e. derbynet.jar and 
> derby.jar). If you try it out, please remember to take a backup of your 
> database! If you need to upgrade you database to trunk, you also need to
>
> enable pre-release upgrades.
>
> Unfortunately, there is no way to properly work around the problem in 
> older releases. You can adjust the buffer/block size used to fetch data 
> in the client application, but this will only help a little bit and will
>
> probably not be sufficient.
> The optimal buffer size is dependent on the Clob content (due to the 
> UTF-8 encoding used). If your Clobs contain ASCII data, you should set 
> your buffer size to 32672 (note that this is smaller than 32KB). If you 
> are using the embedded driver, make sure you access the Clob using one 
> of the streaming methods (getSubString has been fixed by now).
> There will be several changes regarding Clob performance in the next 
> feature release.
>
> We're also in the progress of adding performance regression tests for 
> LOBs, as this is a problem we should have detected a long time ago!
> If you are still seeing the problem with the newest version, I'm very 
> interested in getting more details from you.
>
>
> regards,
>   


RE: how to get data out from Clob?

Posted by Mikael Sundberg <Mi...@artificial-solutions.com>.
Hi
Im using derby 10.4.2 and the clientdriver.
I changed from 
                BufferedReader r = new BufferedReader(new
InputStreamReader(res.getAsciiStream("data")));
                for (String line = r.readLine(); line != null; line =
r.readLine())
                {
                    w.println(line);
                }

To
	String data = res.getString("data");

In a recent update to our application since the getString seemed to be
twice as fast then. Not entirely sure what versions of derby I tested on
then.
So bacisly I should change back now? We don't want to use any unstable
version on production so guess I will have to solve it some other way. 
Il atleast try out the latest version on my testservers to see if that
realy is the problem.

 
-----Original Message-----
From: Kristian.Waagan@Sun.COM [mailto:Kristian.Waagan@Sun.COM] 
Sent: den 25 november 2008 13:32
To: Derby Discussion
Subject: Re: how to get data out from Clob?

Mikael Sundberg write:
>
> We are having some trouble with retrieving data from big Clob fields. 
> The data is about 40-50MB and it takes forever to retrieve it.  
>
> Tried a lot of different methods, getString, getClob, getAsciiStream. 
> They all take forever (hours).
>
> When checking whats running in derby we find
>
> CALL SYSIBM.CLOBGETSUBSTRING(?, ?, ?)
>
> A lot of times.
>
> How should I retrieve large CLob fields fastest?
>
>  
>


Hello Mikael,

You are being hit by a serious performance bug (a lot more prominent 
since Derby 10.3).
I understand you are using the client driver, but which version of Derby

are you using?

I believe the issue you are experiencing has been fixed in trunk and in 
the 10.4 branch, but a release hasn't been made after the fix was 
committed. If you don't want to build Derby yourself, *test binaries* 
are available if you want to test with a newer version. It should be 
sufficient to update the server side only (i.e. derbynet.jar and 
derby.jar). If you try it out, please remember to take a backup of your 
database! If you need to upgrade you database to trunk, you also need to

enable pre-release upgrades.

Unfortunately, there is no way to properly work around the problem in 
older releases. You can adjust the buffer/block size used to fetch data 
in the client application, but this will only help a little bit and will

probably not be sufficient.
The optimal buffer size is dependent on the Clob content (due to the 
UTF-8 encoding used). If your Clobs contain ASCII data, you should set 
your buffer size to 32672 (note that this is smaller than 32KB). If you 
are using the embedded driver, make sure you access the Clob using one 
of the streaming methods (getSubString has been fixed by now).
There will be several changes regarding Clob performance in the next 
feature release.

We're also in the progress of adding performance regression tests for 
LOBs, as this is a problem we should have detected a long time ago!
If you are still seeing the problem with the newest version, I'm very 
interested in getting more details from you.


regards,
-- 
Kristian

Re: how to get data out from Clob?

Posted by Kristian Waagan <Kr...@Sun.COM>.
Mikael Sundberg write:
>
> We are having some trouble with retrieving data from big Clob fields. 
> The data is about 40-50MB and it takes forever to retrieve it.  
>
> Tried a lot of different methods, getString, getClob, getAsciiStream. 
> They all take forever (hours).
>
> When checking whats running in derby we find
>
> CALL SYSIBM.CLOBGETSUBSTRING(?, ?, ?)
>
> A lot of times.
>
> How should I retrieve large CLob fields fastest?
>
>  
>


Hello Mikael,

You are being hit by a serious performance bug (a lot more prominent 
since Derby 10.3).
I understand you are using the client driver, but which version of Derby 
are you using?

I believe the issue you are experiencing has been fixed in trunk and in 
the 10.4 branch, but a release hasn't been made after the fix was 
committed. If you don't want to build Derby yourself, *test binaries* 
are available if you want to test with a newer version. It should be 
sufficient to update the server side only (i.e. derbynet.jar and 
derby.jar). If you try it out, please remember to take a backup of your 
database! If you need to upgrade you database to trunk, you also need to 
enable pre-release upgrades.

Unfortunately, there is no way to properly work around the problem in 
older releases. You can adjust the buffer/block size used to fetch data 
in the client application, but this will only help a little bit and will 
probably not be sufficient.
The optimal buffer size is dependent on the Clob content (due to the 
UTF-8 encoding used). If your Clobs contain ASCII data, you should set 
your buffer size to 32672 (note that this is smaller than 32KB). If you 
are using the embedded driver, make sure you access the Clob using one 
of the streaming methods (getSubString has been fixed by now).
There will be several changes regarding Clob performance in the next 
feature release.

We're also in the progress of adding performance regression tests for 
LOBs, as this is a problem we should have detected a long time ago!
If you are still seeing the problem with the newest version, I'm very 
interested in getting more details from you.


regards,
-- 
Kristian