You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-user@db.apache.org by Stian Brattland <st...@brattland.no> on 2010/01/14 08:43:32 UTC

Streaming Results

Hi,

I've got a question regarding results streaming. The J/Connector for 
MySQL supports results streaming, which means
that you can stream and process rows in a ResultSet one by one. 
Normally, all rows in a ResultSet will be retrived
before you can process the ResultSet. However, i am curious as to wether 
this "feature" also exists in Derby?

In MySQL, you would do the following to stream results from the database 
as you iterate through a ResultSet:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);


Kind regards,
Stian Brattland

Re: Streaming Results

Posted by Stian Brattland <st...@brattland.no>.

Hi,

Thanks for your last comment Bernt. It's really great to hear that i 
won't receive the famous
OutOfMemoryException. Everyone, thanks a lot for your input. Have a good 
evening!

Kind regards,
Stian

Bernt M. Johnsen skrev:
> Hi again,
>
>   
>>>>>>>>>>>>> Stian Brattland wrote (2010-01-14 14:42:47):
>>>>>>>>>>>>>                           
>> Hi,
>>
>> Thank you for your answer Knut Anders. I do understand from what you are  
>> saying that the Derby Driver
>> never will (to user the words of Brett) "materialize the entire  
>> ResultSet on the client-side before returning it
>> to the user", but only materialize as much data as the communications  
>> buffer can hold. That's fine. However,
>> imagine the following scenario:
>>
>> I run a query on a Derby Database which will return 1 million rows. The  
>> Derby Driver initially only returns
>> as much data as the communications buffer can hold. However, as i start  
>> iterating over each row in the
>> ResultSet, the Derby Driver naturally needs to retrieve the rows i am  
>> iterating over. Now, when i am on the
>> last row in the ResultSet, wouldn't that mean that the entire ResultSet  
>> is loaded into memory?
>>     
>
> No. Derby ResultSets are by default TYPE_FORWARD_ONLY and
> CONCUR_READ_ONLY, so that will not happen. The buffer will be reused
> for new "chunks" of data.
>
>   
>> What MySQL allows me to do is to run the query in a "streaming mode"  
>> (yes, might be a stupid name, but it's
>> a good description of what it actually does), which essentially only  
>> allows me to iterate forward in the
>> ResultSet. In other words, an exception is thrown if i try to invoke the  
>> previous() method on the java.sql.ResultSet
>> interface (as it does not store the data i have already iterated over in  
>> the ResultSet). I do not know the inner workings
>> of the MySQL driver, but i assume that data for rows which i already  
>> does not get stored in the ResultSet, but is
>> rather garbage collected. Thus, i will never run into a  
>> java.lang.OutOfMemoryException as the driver does not require
>> an ever increasing amount of memory to store the retrieved rows in a  
>> ResultSet.
>>
>> http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
>>
>> I have to admit that i don't really care that much about the inner  
>> workings of the driver, as long as i do not
>> receive an java.lang.OutOfMemoryException after having iterated over  
>> 100.000 rows in a ResultSet.
>>     
>
> You won't :-)
>
>   
>> Kind regards,
>> Stian
>>
>> Knut Anders Hatlen skrev:
>>     
>>> Stian Brattland <st...@brattland.no> writes:
>>>
>>>   
>>>       
>>>> Yes!
>>>>
>>>> Thank you Brett. That is excactly what i've been trying to
>>>> explain. That's the feature i've been asking about.
>>>>     
>>>>         
>>> Hi Stian,
>>>
>>> What Brett described is exactly how Derby behaves by default (except, as
>>> Bernt mentioned, we pre-fetch as many rows as we can in a 32K
>>> communications buffer, not some constant number of rows). Are you seeing
>>> a different behaviour in your application?
>>>
>>>   
>>>       
>
>

Re: Streaming Results

Posted by "Bernt M. Johnsen" <Be...@Sun.COM>.

Hi again,

>>>>>>>>>>>> Stian Brattland wrote (2010-01-14 14:42:47):
> Hi,
>
> Thank you for your answer Knut Anders. I do understand from what you are  
> saying that the Derby Driver
> never will (to user the words of Brett) "materialize the entire  
> ResultSet on the client-side before returning it
> to the user", but only materialize as much data as the communications  
> buffer can hold. That's fine. However,
> imagine the following scenario:
>
> I run a query on a Derby Database which will return 1 million rows. The  
> Derby Driver initially only returns
> as much data as the communications buffer can hold. However, as i start  
> iterating over each row in the
> ResultSet, the Derby Driver naturally needs to retrieve the rows i am  
> iterating over. Now, when i am on the
> last row in the ResultSet, wouldn't that mean that the entire ResultSet  
> is loaded into memory?

No. Derby ResultSets are by default TYPE_FORWARD_ONLY and
CONCUR_READ_ONLY, so that will not happen. The buffer will be reused
for new "chunks" of data.

> What MySQL allows me to do is to run the query in a "streaming mode"  
> (yes, might be a stupid name, but it's
> a good description of what it actually does), which essentially only  
> allows me to iterate forward in the
> ResultSet. In other words, an exception is thrown if i try to invoke the  
> previous() method on the java.sql.ResultSet
> interface (as it does not store the data i have already iterated over in  
> the ResultSet). I do not know the inner workings
> of the MySQL driver, but i assume that data for rows which i already  
> does not get stored in the ResultSet, but is
> rather garbage collected. Thus, i will never run into a  
> java.lang.OutOfMemoryException as the driver does not require
> an ever increasing amount of memory to store the retrieved rows in a  
> ResultSet.
>
> http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html
>
> I have to admit that i don't really care that much about the inner  
> workings of the driver, as long as i do not
> receive an java.lang.OutOfMemoryException after having iterated over  
> 100.000 rows in a ResultSet.

You won't :-)

>
> Kind regards,
> Stian
>
> Knut Anders Hatlen skrev:
>> Stian Brattland <st...@brattland.no> writes:
>>
>>   
>>> Yes!
>>>
>>> Thank you Brett. That is excactly what i've been trying to
>>> explain. That's the feature i've been asking about.
>>>     
>>
>> Hi Stian,
>>
>> What Brett described is exactly how Derby behaves by default (except, as
>> Bernt mentioned, we pre-fetch as many rows as we can in a 32K
>> communications buffer, not some constant number of rows). Are you seeing
>> a different behaviour in your application?
>>
>>   

-- 
Bernt Marius Johnsen, Staff Engineer
Database Technology Group, Sun Microsystems, Trondheim, Norway

Re: Streaming Results

Posted by Stian Brattland <st...@brattland.no>.

Hi,

Thank you for your answer Knut Anders. I do understand from what you are 
saying that the Derby Driver
never will (to user the words of Brett) "materialize the entire 
ResultSet on the client-side before returning it
to the user", but only materialize as much data as the communications 
buffer can hold. That's fine. However,
imagine the following scenario:

I run a query on a Derby Database which will return 1 million rows. The 
Derby Driver initially only returns
as much data as the communications buffer can hold. However, as i start 
iterating over each row in the
ResultSet, the Derby Driver naturally needs to retrieve the rows i am 
iterating over. Now, when i am on the
last row in the ResultSet, wouldn't that mean that the entire ResultSet 
is loaded into memory?

What MySQL allows me to do is to run the query in a "streaming mode" 
(yes, might be a stupid name, but it's
a good description of what it actually does), which essentially only 
allows me to iterate forward in the
ResultSet. In other words, an exception is thrown if i try to invoke the 
previous() method on the java.sql.ResultSet
interface (as it does not store the data i have already iterated over in 
the ResultSet). I do not know the inner workings
of the MySQL driver, but i assume that data for rows which i already 
does not get stored in the ResultSet, but is
rather garbage collected. Thus, i will never run into a 
java.lang.OutOfMemoryException as the driver does not require
an ever increasing amount of memory to store the retrieved rows in a 
ResultSet.

http://dev.mysql.com/doc/refman/5.0/en/connector-j-reference-implementation-notes.html

I have to admit that i don't really care that much about the inner 
workings of the driver, as long as i do not
receive an java.lang.OutOfMemoryException after having iterated over 
100.000 rows in a ResultSet.

Kind regards,
Stian

Knut Anders Hatlen skrev:
> Stian Brattland <st...@brattland.no> writes:
>
>   
>> Yes!
>>
>> Thank you Brett. That is excactly what i've been trying to
>> explain. That's the feature i've been asking about.
>>     
>
> Hi Stian,
>
> What Brett described is exactly how Derby behaves by default (except, as
> Bernt mentioned, we pre-fetch as many rows as we can in a 32K
> communications buffer, not some constant number of rows). Are you seeing
> a different behaviour in your application?
>
>

Re: Streaming Results

Posted by Knut Anders Hatlen <Kn...@Sun.COM>.

Stian Brattland <st...@brattland.no> writes:

> Yes!
>
> Thank you Brett. That is excactly what i've been trying to
> explain. That's the feature i've been asking about.

Hi Stian,

What Brett described is exactly how Derby behaves by default (except, as
Bernt mentioned, we pre-fetch as many rows as we can in a 32K
communications buffer, not some constant number of rows). Are you seeing
a different behaviour in your application?

-- 
Knut Anders

Re: Streaming Results

Posted by Stian Brattland <st...@brattland.no>.

Yes!

Thank you Brett. That is excactly what i've been trying to explain. 
That's the feature i've been asking about.

Kind regards,
Stian

Brett Wooldridge skrev:
> Bernt,
>
> I think the issue is that Derby will materialize the entire ResultSet 
> on the client-side before returning it to the user.  If the ResultSet 
> is one million rows, then one million rows will be transferred and 
> materialized on the client before the executeQuery() call returns to 
> the user.
>
> Some databases and drivers have the capability to return a streaming 
> ResultSet, such that rows are only transferred as ResultSet.next() is 
> called.  If the driver is clever, it can keep a bit ahead of the calls 
> to next() by transferring X number of rows at a time (where X is 
> something much smaller than a million).
>
> I do not have personal knowledge of whether Derby supports ResultSet 
> streaming (which is distinct from streaming datatypes like CLOBs or 
> BLOBs).  I just wanted to make sure the distinction in the question 
> was clear.
>
> Brett
>
>
> On Thu, Jan 14, 2010 at 7:33 PM, Bernt M. Johnsen 
> <Bernt.Johnsen@sun.com <ma...@sun.com>> wrote:
>
>     >>>>>>>>>>>> Stian Brattland wrote (2010-01-14 11:01:43):
>     > Hi,
>     >
>     > Thank you for your quick reply.
>     >
>     > I will elaborate a little on my question:
>     >
>     > I often need to retrieve a large amount of data from a remote MySQL
>     > database. However,
>     > if my application runs with a default heap size, then i will
>     quickly get
>     > some sort of heap space
>     > exception. The reason is that the ResultSet containing the retrieved
>     > data is too large. What i have
>     > done to get around this is to stream the results from the
>     database, and
>     > process rows one by one
>     > as they are streamed (for instance, storing them in a local
>     database,
>     > like Derby). Of course, things
>     > are most likely behaving more optimal than only transfering one
>     row at a
>     > time from the database
>     > (yes, some buffers are most likely involved). However, my key
>     point was
>     > that i do not have to wait
>     > for the entire ResultSet to become ready before i can start
>     iterating
>     > over the rows. Instead, rows (
>     > be it one or hundred) are retrieved as i iterate over the ResultSet.
>     >
>     > So, my question is wether the Derby Driver has this ability too?
>
>     Yes, that is exactly what the Derby driver (and all other reasonable
>     implemented JDBC drivers) will do. The driver will attempt to fill up
>     the communication buffer (32K) as long as at least one row fits into
>     it.
>
>     >
>     > Kind regards,
>     > Stian Brattland
>     >
>     >
>     > My intention with the question was not really to point out that
>     a the
>     > driver needs to retrive
>     > results in the most ineffective manner as possible.
>     >
>     > Bernt M. Johnsen skrev:
>     >> Hi,
>     >>
>     >> Some general remarks (don't remember the exact details of what
>     Derby
>     >> actually does with setFetchSize).
>     >>
>     >>
>     >>>>>>>>>>>>>> Stian Brattland wrote (2010-01-14 08:43:32):
>     >>>>>>>>>>>>>>
>     >>> Hi,
>     >>>
>     >>> I've got a question regarding results streaming. The
>     J/Connector for
>     >>> MySQL supports results streaming, which means
>     >>> that you can stream and process rows in a ResultSet one by one.
>     >>> Normally, all rows in a ResultSet will be retrived
>     >>> before you can process the ResultSet. However, i am curious as to
>     >>> wether  this "feature" also exists in Derby?
>     >>>
>     >>
>     >> Normally, a JDBC driver will retrieve a suitable number of
>     rows, not
>     >> necessarily all, depending on various factors such as row size,
>     number
>     >> of rows resulting from the query and communication buffer size.
>     >>
>     >>
>     >>> In MySQL, you would do the following to stream results from the
>     >>> database  as you iterate through a ResultSet:
>     >>>
>     >>> stmt =
>     conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
>     java.sql.ResultSet.CONCUR_READ_ONLY);
>     >>> stmt.setFetchSize(Integer.MIN_VALUE);
>     >>>
>     >>
>     >> setFetchSize is just a hint to the driver, See
>     >>
>     >>
>     http://jscstage.sfbay.sun.com/javase/6/docs/api/java/sql/Statement.html#setFetchSize(int)
>     <http://jscstage.sfbay.sun.com/javase/6/docs/api/java/sql/Statement.html#setFetchSize%28int%29>
>     >>
>     >> A well written driver will still try to do thing's optimal, such as
>     >> e.g. fill up the communication buffer with rows to reduce the
>     number
>     >> of roundtrips, regardless of how low you set the fetchSize.
>     >>
>     >> And last, why would you like to force the driver to fetch the
>     rows one
>     >> by one? The only thing you will get fromthat, is extra overhead.
>     >>
>
>     --
>     Bernt Marius Johnsen, Staff Engineer
>     Database Technology Group, Sun Microsystems, Trondheim, Norway
>
>     -----BEGIN PGP SIGNATURE-----
>     Version: GnuPG v1.4.6 (GNU/Linux)
>
>     iD8DBQFLTvL+lFBD9TXBAPARAvlGAJ9zaW2WAFG/97gneqpYZq8IWAkfagCfVEH9
>     3kGYBcg23Fbt34k9lSiqOjk=
>     =D9VP
>     -----END PGP SIGNATURE-----
>
>

Re: Streaming Results

Posted by "Bernt M. Johnsen" <Be...@Sun.COM>.

Hi,

>>>>>>>>>>>> Brett Wooldridge wrote (2010-01-14 20:17:35):
> Bernt,
> 
> I think the issue is that Derby will materialize the entire ResultSet on the
> client-side before returning it to the user.  If the ResultSet is one
> million rows, then one million rows will be transferred and materialized on
> the client before the executeQuery() call returns to the user.

No. My point is that Derby won't materialize the entire result set on
the client side. The driver will usually materialize the number of
rows that fits into the 32K communications buffer, and when the buffer
is exhausted, it will fetch and materialize another 32K.

> Some databases and drivers have the capability to return a streaming
> ResultSet, such that rows are only transferred as ResultSet.next() is
> called.  If the driver is clever, it can keep a bit ahead of the calls to
> next() by transferring X number of rows at a time (where X is something much
> smaller than a million).
> 
> I do not have personal knowledge of whether Derby supports ResultSet
> streaming (which is distinct from streaming datatypes like CLOBs or BLOBs).
>  I just wanted to make sure the distinction in the question was
> clear.

What you call "ResultSet streaming" is as far is I know the normal
behaviour of most JDBC drivers.

> 
> Brett
> 
> 
> On Thu, Jan 14, 2010 at 7:33 PM, Bernt M. Johnsen <Be...@sun.com>wrote:
> 
> > >>>>>>>>>>>> Stian Brattland wrote (2010-01-14 11:01:43):
> > > Hi,
> > >
> > > Thank you for your quick reply.
> > >
> > > I will elaborate a little on my question:
> > >
> > > I often need to retrieve a large amount of data from a remote MySQL
> > > database. However,
> > > if my application runs with a default heap size, then i will quickly get
> > > some sort of heap space
> > > exception. The reason is that the ResultSet containing the retrieved
> > > data is too large. What i have
> > > done to get around this is to stream the results from the database, and
> > > process rows one by one
> > > as they are streamed (for instance, storing them in a local database,
> > > like Derby). Of course, things
> > > are most likely behaving more optimal than only transfering one row at a
> > > time from the database
> > > (yes, some buffers are most likely involved). However, my key point was
> > > that i do not have to wait
> > > for the entire ResultSet to become ready before i can start iterating
> > > over the rows. Instead, rows (
> > > be it one or hundred) are retrieved as i iterate over the ResultSet.
> > >
> > > So, my question is wether the Derby Driver has this ability too?
> >
> > Yes, that is exactly what the Derby driver (and all other reasonable
> > implemented JDBC drivers) will do. The driver will attempt to fill up
> > the communication buffer (32K) as long as at least one row fits into
> > it.
> >
> > >
> > > Kind regards,
> > > Stian Brattland
> > >
> > >
> > > My intention with the question was not really to point out that a the
> > > driver needs to retrive
> > > results in the most ineffective manner as possible.
> > >
> > > Bernt M. Johnsen skrev:
> > >> Hi,
> > >>
> > >> Some general remarks (don't remember the exact details of what Derby
> > >> actually does with setFetchSize).
> > >>
> > >>
> > >>>>>>>>>>>>>> Stian Brattland wrote (2010-01-14 08:43:32):
> > >>>>>>>>>>>>>>
> > >>> Hi,
> > >>>
> > >>> I've got a question regarding results streaming. The J/Connector for
> > >>> MySQL supports results streaming, which means
> > >>> that you can stream and process rows in a ResultSet one by one.
> > >>> Normally, all rows in a ResultSet will be retrived
> > >>> before you can process the ResultSet. However, i am curious as to
> > >>> wether  this "feature" also exists in Derby?
> > >>>
> > >>
> > >> Normally, a JDBC driver will retrieve a suitable number of rows, not
> > >> necessarily all, depending on various factors such as row size, number
> > >> of rows resulting from the query and communication buffer size.
> > >>
> > >>
> > >>> In MySQL, you would do the following to stream results from the
> > >>> database  as you iterate through a ResultSet:
> > >>>
> > >>> stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
> > java.sql.ResultSet.CONCUR_READ_ONLY);
> > >>> stmt.setFetchSize(Integer.MIN_VALUE);
> > >>>
> > >>
> > >> setFetchSize is just a hint to the driver, See
> > >>
> > >>
> > http://jscstage.sfbay.sun.com/javase/6/docs/api/java/sql/Statement.html#setFetchSize(int)
> > >>
> > >> A well written driver will still try to do thing's optimal, such as
> > >> e.g. fill up the communication buffer with rows to reduce the number
> > >> of roundtrips, regardless of how low you set the fetchSize.
> > >>
> > >> And last, why would you like to force the driver to fetch the rows one
> > >> by one? The only thing you will get fromthat, is extra overhead.
> > >>
> >
> > --
> > Bernt Marius Johnsen, Staff Engineer
> > Database Technology Group, Sun Microsystems, Trondheim, Norway
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.6 (GNU/Linux)
> >
> > iD8DBQFLTvL+lFBD9TXBAPARAvlGAJ9zaW2WAFG/97gneqpYZq8IWAkfagCfVEH9
> > 3kGYBcg23Fbt34k9lSiqOjk=
> > =D9VP
> > -----END PGP SIGNATURE-----
> >
> >

-- 
Bernt Marius Johnsen, Staff Engineer
Database Technology Group, Sun Microsystems, Trondheim, Norway

Re: Streaming Results

Posted by Brett Wooldridge <br...@gmail.com>.

Bernt,

I think the issue is that Derby will materialize the entire ResultSet on the
client-side before returning it to the user.  If the ResultSet is one
million rows, then one million rows will be transferred and materialized on
the client before the executeQuery() call returns to the user.

Some databases and drivers have the capability to return a streaming
ResultSet, such that rows are only transferred as ResultSet.next() is
called.  If the driver is clever, it can keep a bit ahead of the calls to
next() by transferring X number of rows at a time (where X is something much
smaller than a million).

I do not have personal knowledge of whether Derby supports ResultSet
streaming (which is distinct from streaming datatypes like CLOBs or BLOBs).
 I just wanted to make sure the distinction in the question was clear.

Brett


On Thu, Jan 14, 2010 at 7:33 PM, Bernt M. Johnsen <Be...@sun.com>wrote:

> >>>>>>>>>>>> Stian Brattland wrote (2010-01-14 11:01:43):
> > Hi,
> >
> > Thank you for your quick reply.
> >
> > I will elaborate a little on my question:
> >
> > I often need to retrieve a large amount of data from a remote MySQL
> > database. However,
> > if my application runs with a default heap size, then i will quickly get
> > some sort of heap space
> > exception. The reason is that the ResultSet containing the retrieved
> > data is too large. What i have
> > done to get around this is to stream the results from the database, and
> > process rows one by one
> > as they are streamed (for instance, storing them in a local database,
> > like Derby). Of course, things
> > are most likely behaving more optimal than only transfering one row at a
> > time from the database
> > (yes, some buffers are most likely involved). However, my key point was
> > that i do not have to wait
> > for the entire ResultSet to become ready before i can start iterating
> > over the rows. Instead, rows (
> > be it one or hundred) are retrieved as i iterate over the ResultSet.
> >
> > So, my question is wether the Derby Driver has this ability too?
>
> Yes, that is exactly what the Derby driver (and all other reasonable
> implemented JDBC drivers) will do. The driver will attempt to fill up
> the communication buffer (32K) as long as at least one row fits into
> it.
>
> >
> > Kind regards,
> > Stian Brattland
> >
> >
> > My intention with the question was not really to point out that a the
> > driver needs to retrive
> > results in the most ineffective manner as possible.
> >
> > Bernt M. Johnsen skrev:
> >> Hi,
> >>
> >> Some general remarks (don't remember the exact details of what Derby
> >> actually does with setFetchSize).
> >>
> >>
> >>>>>>>>>>>>>> Stian Brattland wrote (2010-01-14 08:43:32):
> >>>>>>>>>>>>>>
> >>> Hi,
> >>>
> >>> I've got a question regarding results streaming. The J/Connector for
> >>> MySQL supports results streaming, which means
> >>> that you can stream and process rows in a ResultSet one by one.
> >>> Normally, all rows in a ResultSet will be retrived
> >>> before you can process the ResultSet. However, i am curious as to
> >>> wether  this "feature" also exists in Derby?
> >>>
> >>
> >> Normally, a JDBC driver will retrieve a suitable number of rows, not
> >> necessarily all, depending on various factors such as row size, number
> >> of rows resulting from the query and communication buffer size.
> >>
> >>
> >>> In MySQL, you would do the following to stream results from the
> >>> database  as you iterate through a ResultSet:
> >>>
> >>> stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
> java.sql.ResultSet.CONCUR_READ_ONLY);
> >>> stmt.setFetchSize(Integer.MIN_VALUE);
> >>>
> >>
> >> setFetchSize is just a hint to the driver, See
> >>
> >>
> http://jscstage.sfbay.sun.com/javase/6/docs/api/java/sql/Statement.html#setFetchSize(int)
> >>
> >> A well written driver will still try to do thing's optimal, such as
> >> e.g. fill up the communication buffer with rows to reduce the number
> >> of roundtrips, regardless of how low you set the fetchSize.
> >>
> >> And last, why would you like to force the driver to fetch the rows one
> >> by one? The only thing you will get fromthat, is extra overhead.
> >>
>
> --
> Bernt Marius Johnsen, Staff Engineer
> Database Technology Group, Sun Microsystems, Trondheim, Norway
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
>
> iD8DBQFLTvL+lFBD9TXBAPARAvlGAJ9zaW2WAFG/97gneqpYZq8IWAkfagCfVEH9
> 3kGYBcg23Fbt34k9lSiqOjk=
> =D9VP
> -----END PGP SIGNATURE-----
>
>

Re: Streaming Results

Posted by "Bernt M. Johnsen" <Be...@Sun.COM>.

>>>>>>>>>>>> Stian Brattland wrote (2010-01-14 11:01:43):
> Hi,
>
> Thank you for your quick reply.
>
> I will elaborate a little on my question:
>
> I often need to retrieve a large amount of data from a remote MySQL  
> database. However,
> if my application runs with a default heap size, then i will quickly get  
> some sort of heap space
> exception. The reason is that the ResultSet containing the retrieved  
> data is too large. What i have
> done to get around this is to stream the results from the database, and  
> process rows one by one
> as they are streamed (for instance, storing them in a local database,  
> like Derby). Of course, things
> are most likely behaving more optimal than only transfering one row at a  
> time from the database
> (yes, some buffers are most likely involved). However, my key point was  
> that i do not have to wait
> for the entire ResultSet to become ready before i can start iterating  
> over the rows. Instead, rows (
> be it one or hundred) are retrieved as i iterate over the ResultSet.
>
> So, my question is wether the Derby Driver has this ability too?

Yes, that is exactly what the Derby driver (and all other reasonable
implemented JDBC drivers) will do. The driver will attempt to fill up
the communication buffer (32K) as long as at least one row fits into
it. 

>
> Kind regards,
> Stian Brattland
>
>
> My intention with the question was not really to point out that a the  
> driver needs to retrive
> results in the most ineffective manner as possible.
>
> Bernt M. Johnsen skrev:
>> Hi,
>>
>> Some general remarks (don't remember the exact details of what Derby
>> actually does with setFetchSize).
>>
>>   
>>>>>>>>>>>>>> Stian Brattland wrote (2010-01-14 08:43:32):
>>>>>>>>>>>>>>                           
>>> Hi,
>>>
>>> I've got a question regarding results streaming. The J/Connector for  
>>> MySQL supports results streaming, which means
>>> that you can stream and process rows in a ResultSet one by one.   
>>> Normally, all rows in a ResultSet will be retrived
>>> before you can process the ResultSet. However, i am curious as to 
>>> wether  this "feature" also exists in Derby?
>>>     
>>
>> Normally, a JDBC driver will retrieve a suitable number of rows, not
>> necessarily all, depending on various factors such as row size, number
>> of rows resulting from the query and communication buffer size.
>>
>>   
>>> In MySQL, you would do the following to stream results from the 
>>> database  as you iterate through a ResultSet:
>>>
>>> stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
>>> stmt.setFetchSize(Integer.MIN_VALUE);
>>>     
>>
>> setFetchSize is just a hint to the driver, See
>>
>> http://jscstage.sfbay.sun.com/javase/6/docs/api/java/sql/Statement.html#setFetchSize(int)
>>
>> A well written driver will still try to do thing's optimal, such as
>> e.g. fill up the communication buffer with rows to reduce the number
>> of roundtrips, regardless of how low you set the fetchSize.
>>
>> And last, why would you like to force the driver to fetch the rows one
>> by one? The only thing you will get fromthat, is extra overhead.
>>   

-- 
Bernt Marius Johnsen, Staff Engineer
Database Technology Group, Sun Microsystems, Trondheim, Norway

Re: Streaming Results

Posted by Stian Brattland <st...@brattland.no>.

Hi,

Thank you for your quick reply.

I will elaborate a little on my question:

I often need to retrieve a large amount of data from a remote MySQL 
database. However,
if my application runs with a default heap size, then i will quickly get 
some sort of heap space
exception. The reason is that the ResultSet containing the retrieved 
data is too large. What i have
done to get around this is to stream the results from the database, and 
process rows one by one
as they are streamed (for instance, storing them in a local database, 
like Derby). Of course, things
are most likely behaving more optimal than only transfering one row at a 
time from the database
(yes, some buffers are most likely involved). However, my key point was 
that i do not have to wait
for the entire ResultSet to become ready before i can start iterating 
over the rows. Instead, rows (
be it one or hundred) are retrieved as i iterate over the ResultSet.

So, my question is wether the Derby Driver has this ability too?

Kind regards,
Stian Brattland


My intention with the question was not really to point out that a the 
driver needs to retrive
results in the most ineffective manner as possible.

Bernt M. Johnsen skrev:
> Hi,
>
> Some general remarks (don't remember the exact details of what Derby
> actually does with setFetchSize).
>
>   
>>>>>>>>>>>>> Stian Brattland wrote (2010-01-14 08:43:32):
>>>>>>>>>>>>>                           
>> Hi,
>>
>> I've got a question regarding results streaming. The J/Connector for  
>> MySQL supports results streaming, which means
>> that you can stream and process rows in a ResultSet one by one.  
>> Normally, all rows in a ResultSet will be retrived
>> before you can process the ResultSet. However, i am curious as to wether  
>> this "feature" also exists in Derby?
>>     
>
> Normally, a JDBC driver will retrieve a suitable number of rows, not
> necessarily all, depending on various factors such as row size, number
> of rows resulting from the query and communication buffer size.
>
>   
>> In MySQL, you would do the following to stream results from the database  
>> as you iterate through a ResultSet:
>>
>> stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
>> stmt.setFetchSize(Integer.MIN_VALUE);
>>     
>
> setFetchSize is just a hint to the driver, See
>
> http://jscstage.sfbay.sun.com/javase/6/docs/api/java/sql/Statement.html#setFetchSize(int)
>
> A well written driver will still try to do thing's optimal, such as
> e.g. fill up the communication buffer with rows to reduce the number
> of roundtrips, regardless of how low you set the fetchSize.
>
> And last, why would you like to force the driver to fetch the rows one
> by one? The only thing you will get fromthat, is extra overhead.
>

Re: Streaming Results

Posted by "Bernt M. Johnsen" <Be...@Sun.COM>.

Hi,

Some general remarks (don't remember the exact details of what Derby
actually does with setFetchSize).

>>>>>>>>>>>> Stian Brattland wrote (2010-01-14 08:43:32):
> Hi,
>
> I've got a question regarding results streaming. The J/Connector for  
> MySQL supports results streaming, which means
> that you can stream and process rows in a ResultSet one by one.  
> Normally, all rows in a ResultSet will be retrived
> before you can process the ResultSet. However, i am curious as to wether  
> this "feature" also exists in Derby?

Normally, a JDBC driver will retrieve a suitable number of rows, not
necessarily all, depending on various factors such as row size, number
of rows resulting from the query and communication buffer size.

>
> In MySQL, you would do the following to stream results from the database  
> as you iterate through a ResultSet:
>
> stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY, java.sql.ResultSet.CONCUR_READ_ONLY);
> stmt.setFetchSize(Integer.MIN_VALUE);

setFetchSize is just a hint to the driver, See

http://jscstage.sfbay.sun.com/javase/6/docs/api/java/sql/Statement.html#setFetchSize(int)

A well written driver will still try to do thing's optimal, such as
e.g. fill up the communication buffer with rows to reduce the number
of roundtrips, regardless of how low you set the fetchSize.

And last, why would you like to force the driver to fetch the rows one
by one? The only thing you will get fromthat, is extra overhead.
-- 
Bernt Marius Johnsen, Staff Engineer
Database Technology Group, Sun Microsystems, Trondheim, Norway