You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@apr.apache.org by Bojan Smojver <bo...@rexursive.com> on 2006/08/14 22:58:28 UTC

DBD: Prepared statements, BLOBs etc.

Some time ago we had quite a few e-mails exchanged about DBD's prepared
statement support and various ways of getting binary data in and out of
prepared statements. So, I prepared these two patches to outline with
code what I had in mind. You may find some of your suggestions in these
patches - this is not by accident.

Before you attempt to compile and run this, let me put it bluntly: these
patches DON'T WORK! They have not been tested at all - they are here
just to express some of the ideas in code. The code in there may (and
most likely does) have serious flaws, even to the point where you could
be scratching you head thinking "what did he want with this". Hopefully,
main ideas will remain visible.

What is in the patches? This (roughly):

- parsing of SQL to find parameters has been moved out of the drivers
and into the upper layer, so it should be identical for all supported
databases

- new functions apr_dbd_pb[v]query/select() have been introduced, which
take "binary" (void *) arguments instead of const char *

- new function apr_dbd_datum_get() has been introduced to fetch binary
data from prepared queries

- structure apr_dbd_blob_t has been defined to pass BLOBs into the query

- SQL to C types have been enumerated

- some situations of in-driver duplication of code have been avoided
(this is mostly as a consequence of new functions being introduced)

- "streaming" of BLOB data using a bucket brigade from MySQL (now there
is some code that *definitely* won't work :-)

- the code should be binary compatible with 1.2.x (i.e. no need to go to
APR-Util 2.x to have this)

- lots and lots of new bugs ;-)

You will find that I haven't done any work on the Oracle driver. It is
simply because I really have no idea about the API. But, I tried to
incorporate some of the concerns raised into the patches (i.e. in terms
of BLOBs). I guess SQLite3, PGSQL and MySQL stuff should be sufficient
enough to represent the ideas.

Criticise away - this is what the patches are for!

PS. Due to size issues on the list, I'll send the MySQL patch in a
separate e-mail.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Tue, 2006-08-15 at 06:58 +1000, Bojan Smojver wrote:

> PS. Due to size issues on the list, I'll send the MySQL patch in a
> separate e-mail.

MySQL patch.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Thu, 2007-01-18 at 14:17 +1100, Geoffrey Giesemann wrote:

> Can you try:
> 
> INSERT INTO Print_media(Product_id, Ad_id, Ad_sourcetext VALUES
>  (2004, 1, TO_LOB(:SRCTXT))
>            ^^^^^^
> instead?

Unfortunately, changing SQL wasn't an option, as this is something users
carve by hand and give to APU DBD. The goal is that it should "just
work", for most common cases.

> This suggests that the implicit data type for :SRCTXT is a LONG (for
> some reason).  Is there a more specific bind type you could use? (other
> than SQLT_CHR)

The fix was actually to use types depending on the nature of LOB. For
CLOBs, either SQLT_CHR or SQLT_LNG worked fine. But for BLOBs, one needs
to use SQLT_LBI (or maybe something else I'm yet unaware of). And since
my test table had a mix of both and Oracle wouldn't tell me on which
column the problem actually was, plus the fact that I didn't find the
SQLT_LBI mentioned anywhere in the docs, it took a while to figure
things out. 

Thanks for your help,
-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Geoffrey Giesemann <ge...@cs.rmit.edu.au>.

On Thu, Jan 18, 2007 at 01:37:49PM +1100, Bojan Smojver wrote:
> 
> Well, I'm facing some major grief regarding this. I found another
> example, which would suggest more or less the same thing here, but using
> SQLT_CHR type:
> 
> http://download-east.oracle.com/docs/cd/B19306_01/appdev.102/b14249/adlob_data_interface.htm
> 
> The text and code is:
> 
> ----------------------------------
> Binding LONG Data for LOB Columns in Binds Greater Than 4000 Bytes
> 
> The following example illustrates binding character data for a LOB
> column:
> 
> void simple_insert()
> {
>   word buflen;
>   text buf[5000];
>   text *insstmt = (text *) "INSERT INTO Print_media(Product_id, Ad_id,\
>                   Ad_sourcetext) VALUES (2004, 1, :SRCTXT)";
>  
>   OCIStmtPrepare(stmthp, errhp, insstmt, (ub4)strlen((char *)insstmt), 
>                 (ub4) OCI_NTV_SYNTAX, (ub4) OCI_DEFAULT);
>   
>   OCIBindByName(stmthp, &bndhp[0], errhp,
>                 (text *) ":SRCTXT", (sb4) strlen((char *) ":SRCTXT"),
>                 (dvoid *) buf, (sb4) sizeof(buf), SQLT_CHR,
>                 (dvoid *) 0, (ub2 *) 0, (ub2 *) 0,
>                 (ub4) 0, (ub4 *) 0, (ub4) OCI_DEFAULT);
>  
>   memset((void *)buf, (int)'A', (size_t)5000);
>   OCIStmtExecute(svchp, stmthp, errhp, (ub4) 1, (ub4) 0,
>                  (const OCISnapshot*) 0, (OCISnapshot*) 0, (ub4) OCI_DEFAULT);
> }
> ----------------------------------
> 

(This is more of an Oracle query than an APR query, but here goes)

What's the actual type of the Ad_sourcetext column?

Can you try:

INSERT INTO Print_media(Product_id, Ad_id, Ad_sourcetext VALUES
 (2004, 1, TO_LOB(:SRCTXT))
           ^^^^^^

instead?

> 
> ----------------------------------
> Execute error -1: ORA-01461: can bind a LONG value only for insert into
> a LONG column
> ----------------------------------
>
>
> Values up to and including 4000 bytes work fine (minus the character
> conversion bit, which I still need to figure out). The error appears
> with 4001+ bytes.
>

This suggests that the implicit data type for :SRCTXT is a LONG (for
some reason).  Is there a more specific bind type you could use? (other
than SQLT_CHR)

(I will disclaim that I've never done any C-based OCI programming
before, but I do know my way around Oracle)

--Geoff

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Thu, 2007-01-18 at 13:37 +1100, Bojan Smojver wrote:

> Well, I'm facing some major grief regarding this.

Solved. BLOBs need to be bound as SQLT_LBI and CLOBs as SQLT_LNG or
SQLT_CHR.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

Quoting Chris Darroch <ch...@pearsoncmg.com>:

> At any rate, it would be nice if OCI_NEW_LENGTH_SEMANTICS actually
> worked as promised.  :-)

He, he... I see you also have a lot of confidence in Oracle docs. :-)

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Chris Darroch <ch...@pearsoncmg.com>.

Bojan Smojver wrote:

> BTW, do you think OCI_NEW_LENGTH_SEMANTICS would be worth adding to
> OCIEnvCreate() call? Apparently (according to the documentation :-), it
> does this:
> 
> OCI_NEW_LENGTH_SEMANTICS - byte-length semantics is used consistently
> for all handles, regardless of character sets.
> 
> Hopefully, this would then make sure OCILobRead() returns proper number
> of bytes read.

   Might do, indeed, but it appears to only be available starting with 9i.
Maybe that's sufficient for APR-util; 8i isn't supported anymore by
Oracle, IIRC.

   (I say "might do" because I've tried using OCI_SHARED with OCIEnvCreate()
and discovered that it merely triggered a fascinating bug, well-documented
by other people in various bug reports to Oracle, and never fixed
acknowledged as such.  There may be a fix in 10g for it; I haven't
checked.  At any rate, it would be nice if OCI_NEW_LENGTH_SEMANTICS actually
worked as promised.  :-)

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Wed, 2007-01-17 at 20:44 -0800, Chris Darroch wrote:

>    You could be hitting a character set conversion issue between
> your client and the server, or simply a poor piece of Oracle documentation.

The latter. Thanks for your help.

BTW, do you think OCI_NEW_LENGTH_SEMANTICS would be worth adding to
OCIEnvCreate() call? Apparently (according to the documentation :-), it
does this:

------------------------
OCI_NEW_LENGTH_SEMANTICS - byte-length semantics is used consistently
for all handles, regardless of character sets.
------------------------

Hopefully, this would then make sure OCILobRead() returns proper number
of bytes read.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Chris Darroch <ch...@pearsoncmg.com>.

Hi --

Bojan Smojver wrote:

> Looks pretty simple, doesn't it? Well, I tried with Oracle XE 10.2.0.1
> client and Instant Client 10.2.0.3, all connecting to Oracle XE Server
> (Universal and not) 10.2.0.3 on Linux - no go. I get:
> 
> Execute error -1: ORA-01461: can bind a LONG value only for insert into
> a LONG column

   You could be hitting a character set conversion issue between
your client and the server, or simply a poor piece of Oracle documentation.
A few possibilities are mentioned here:

http://groups.google.com/group/perl.dbi.users/browse_thread/thread/1659d16b3092699e/63df8e352aceec7a?lnk=st&q=ora+1461&rnum=7&hl=en#63df8e352aceec7a

They talk about "ML" or MetaLink in that thread; that's the central
Oracle support site.  Several "notes" from MetaLink are partially quoted.

> Now, the strange thing is that the size of BLOB and CLOB in my table is
> reported as 4000 bytes. Aren't these things supposed to hold at least
> gigabytes?

   Yes, they should.  But, the documentation makes fairly clear that
a 4 KB limit is imposed on a lot of internal conversions and
operations.  If you can get it working, more power to you!  You
asked elsewhere about the murky subject of Oracle character sets and
I'm afraid I'm far from expert on it.

   Personally I'm beginning to wonder if it wouldn't make more sense
to effectively return APR_ENOTIMPL and only provide LOB handling for
"sane" DB servers.  Maybe a driver-specific set of calls could provide
special handling for those that really needed it.

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Mon, 2007-01-15 at 13:06 +1100, Bojan Smojver wrote:

> So, one doesn't even need the LOB descriptor or anything, let alone  
> the name of the table or column. Did I miss something important here?  
> It looks pretty straightforward to me...

Well, I'm facing some major grief regarding this. I found another
example, which would suggest more or less the same thing here, but using
SQLT_CHR type:

http://download-east.oracle.com/docs/cd/B19306_01/appdev.102/b14249/adlob_data_interface.htm

The text and code is:

----------------------------------
Binding LONG Data for LOB Columns in Binds Greater Than 4000 Bytes

The following example illustrates binding character data for a LOB
column:

void simple_insert()
{
  word buflen;
  text buf[5000];
  text *insstmt = (text *) "INSERT INTO Print_media(Product_id, Ad_id,\
                  Ad_sourcetext) VALUES (2004, 1, :SRCTXT)";

  OCIStmtPrepare(stmthp, errhp, insstmt, (ub4)strlen((char *)insstmt), 
                (ub4) OCI_NTV_SYNTAX, (ub4) OCI_DEFAULT);

  OCIBindByName(stmthp, &bndhp[0], errhp,
                (text *) ":SRCTXT", (sb4) strlen((char *) ":SRCTXT"),
                (dvoid *) buf, (sb4) sizeof(buf), SQLT_CHR,
                (dvoid *) 0, (ub2 *) 0, (ub2 *) 0,
                (ub4) 0, (ub4 *) 0, (ub4) OCI_DEFAULT);

  memset((void *)buf, (int)'A', (size_t)5000);
  OCIStmtExecute(svchp, stmthp, errhp, (ub4) 1, (ub4) 0,
                 (const OCISnapshot*) 0, (OCISnapshot*) 0, (ub4) OCI_DEFAULT);
}
----------------------------------

Looks pretty simple, doesn't it? Well, I tried with Oracle XE 10.2.0.1
client and Instant Client 10.2.0.3, all connecting to Oracle XE Server
(Universal and not) 10.2.0.3 on Linux - no go. I get:

----------------------------------
Execute error -1: ORA-01461: can bind a LONG value only for insert into
a LONG column
----------------------------------

Values up to and including 4000 bytes work fine (minus the character
conversion bit, which I still need to figure out). The error appears
with 4001+ bytes.

Now, the strange thing is that the size of BLOB and CLOB in my table is
reported as 4000 bytes. Aren't these things supposed to hold at least
gigabytes?

Any suggestions welcome...

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Mon, 2007-01-15 at 21:42 -0800, Chris Darroch wrote:

>  No, no flames implied!

Apologies if I created any misunderstanding. I was just giving you
permission to flame me next time, since this was the second time you
patiently explained the same problem to the list and my forgetful self.

Sorry... :-(

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Chris Darroch <ch...@pearsoncmg.com>.

Bojan Smojver wrote:

> Seriously, I just wasn't sure any more where the weirdness came from, so
> I wanted to make sure. Thanks for your explanation and if I ask again,
> feel free to flame me (although I'm bound to forget that quickly, given
> my powers of remembering things :-).

   No, no flames implied!  It's fairly painful API to work with and
sufficiently non-intuitive that I always have go back and read the
docs again whenever I pick it up.  I'm sorry I haven't gotten around to
a version to match up with your previous work on the DBD layer, but work
has kept me busy on other things.

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Mon, 2007-01-15 at 16:02 -0800, Chris Darroch wrote:

>    The best resource I've found for explicating the mysteries of OCI
> is the Perl DBD::Oracle module:
> 
> http://search.cpan.org/dist/DBD-Oracle/Oracle.pm#Simple_Usage
> 
>    I quoted them once before;

Sorry, I kind of remembered you did, but I couldn't locate the thread.

>  skim down to the paragraph that begins
> "Worse, there's no way ...":
> 
> http://marc.theaimsgroup.com/?l=apr-dev&m=114909480018188&w=2
> 
>    In short, DBD::Oracle performs heroic feats to learn the table and
> field names of a particular LOB, so it can retrieve the appropriate LOB
> locator.  IIRC, this is because, when inserting into a table with LOB
> columns, you have to insert a row with empty LOB placeholders.  Then you
> have to retrieve back the locators for each LOB -- these don't exist
> until you do the insert -- and write the final values into each LOB
> using its locator.

Aha, there is more to OCI than meets the eye. It sucks :-(

>    You can't retrieve back the locators without the table and field
> name (or position), but of course these aren't provided in a neat package
> by the user's SQL statement.  So, in order to make this whole process
> appear like an atomic insert, DBD::Oracle parses your SQL to try to
> determine the table name, and then looks for a LOB in that table.
> If that works, it all seems like a single atomic insert to the user.

OUCH! This is even worse - we'd need to know that we're actually doing
an INSERT and then act upon it. It *really* sucks! :-(

>    But, if there are several LOBs, hints from the user are required,
> because there's just no auto-magic way of knowing which LOB should get
> which input data.

Right. I think I'm starting to understand now.

>    My own sense was that APR would probably be better off without
> the SQL-parsing magic of DBD::Oracle, since even at its best, it can't
> handle every case.  I'd guess that it can also be fooled by especially
> hairy SQL statements.  Hence, this whole notion of requiring a full
> set of identifiers for LOBs, across the board.

I agree. Parsing SQL should be kept to a minimum in APU - things can go
horribly wrong if we don't do it quite right.

I started playing with the driver already and since I already invented
the opaque apr_dbd_blob_t and accompanying functions, there is nothing
that needs doing once we decide to switch to LOB locators - the required
info will be there. And for the character mode, we can just keep
length:table:column:payload syntax (as discussed previously) and we
should be fine (which is also done).

>    Or, it might be better for APR to just ignore Oracle and return,
> in essence, APR_ENOTIMPL in some cases.  (Since drivers don't return
> APR error codes from all functions, this might not be literally possible.)

I'll try to implement as much a possible (most likely with SQLT_LNG for
now, to keep code simple), but if I get stuck, the APR_ENOTIMPL may be
the only option.

> If memory serves, PostgreSQL also has some LOB weirdness to it.

It has a thingy called bytea, which is a binary string. If you whack it
in through a prepared statement, there is little that needs to be done
to it, apart from telling PostgreSQL that it's a binary and passing the
length in. If it's part of the literal SQL query, then it needs to be
escaped, but that's not something we should be worried about, as the
preferred way is to do prepared statements anyway.

> Maybe it would be better to offer LOB support only where the database
> conforms to a "sane standard", whatever that is, and allow drivers to
> offer support for non-standard weirdness via some other mechanism.

We should be able to work around at least some of that stuff. I'll just
keep the existing code for apr_dbd_blob_t and when I'm done modifying
the driver, I'll post the result, so that everyone can have a good
laugh :-).

Seriously, I just wasn't sure any more where the weirdness came from, so
I wanted to make sure. Thanks for your explanation and if I ask again,
feel free to flame me (although I'm bound to forget that quickly, given
my powers of remembering things :-).

>    As I'm not an APR committer, I have no vote in this,

That is a real shame.

Thanks for your time. Hopefully, I'll have some code for you to review
in the coming weeks.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Chris Darroch <ch...@pearsoncmg.com>.

Bojan Smojver wrote:

> I was actually wondering about the need to pass the name of the table
> and the name of column to p[b]query/select, which was the bit that
> wasn't clear to me. I just can't see the need for that, after my brief
> reading of OCI documentation. Which would then enable us to use
> apr_datum_t for passing binary BLOB/CLOB data into pbquery/select,
> instead of inventing yet another structure and a set of functions to
> deal with it (as it would be opaque).

   The best resource I've found for explicating the mysteries of OCI
is the Perl DBD::Oracle module:

http://search.cpan.org/dist/DBD-Oracle/Oracle.pm#Simple_Usage

   I quoted them once before; skim down to the paragraph that begins
"Worse, there's no way ...":

http://marc.theaimsgroup.com/?l=apr-dev&m=114909480018188&w=2

   In short, DBD::Oracle performs heroic feats to learn the table and
field names of a particular LOB, so it can retrieve the appropriate LOB
locator.  IIRC, this is because, when inserting into a table with LOB
columns, you have to insert a row with empty LOB placeholders.  Then you
have to retrieve back the locators for each LOB -- these don't exist
until you do the insert -- and write the final values into each LOB
using its locator.

   You can't retrieve back the locators without the table and field
name (or position), but of course these aren't provided in a neat package
by the user's SQL statement.  So, in order to make this whole process
appear like an atomic insert, DBD::Oracle parses your SQL to try to
determine the table name, and then looks for a LOB in that table.
If that works, it all seems like a single atomic insert to the user.

   But, if there are several LOBs, hints from the user are required,
because there's just no auto-magic way of knowing which LOB should get
which input data.

   My own sense was that APR would probably be better off without
the SQL-parsing magic of DBD::Oracle, since even at its best, it can't
handle every case.  I'd guess that it can also be fooled by especially
hairy SQL statements.  Hence, this whole notion of requiring a full
set of identifiers for LOBs, across the board.

   Or, it might be better for APR to just ignore Oracle and return,
in essence, APR_ENOTIMPL in some cases.  (Since drivers don't return
APR error codes from all functions, this might not be literally possible.)
If memory serves, PostgreSQL also has some LOB weirdness to it.
Maybe it would be better to offer LOB support only where the database
conforms to a "sane standard", whatever that is, and allow drivers to
offer support for non-standard weirdness via some other mechanism.

   As I'm not an APR committer, I have no vote in this, but I'm certainly
interested in opinions and in the outcome, as I'm a user of apr_dbd +
mod_dbd + Oracle + LOBs.  Thanks!

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Mon, 2007-01-15 at 14:37 -0800, Chris Darroch wrote:

>    For binary BLOB data, that strlen() is going to be a problem.

Yep, that's right. And for the CLOBs it is going to be inefficient, if
those things are big. That's why we can go with an approach that, in
character mode, encodes both BLOB and CLOB as length:payload. In binary
mode, we can just pass this value in apr_datum_t. In both cases, no
strlen() would be needed any more.

Thanks for confirming this.

> At the very least, a way of knowing the length of binary data is
> required.  Beyond that, it looks like Oracle documents a number of
> limitations on this approach:
> 
> http://www.stanford.edu/dept/itss/docs/oracle/10g/appdev.101/b10779/oci05bnd.htm#428395
> 
> Although it's not mentioned, my own guess is that there are performance
> reasons for using LOB locators as well, but that's just a guess.
> The issues I suspect would trip users up would be the ones that sometimes
> impose a 4 KB limit on the data, depending on the particulars of the
> SQL statement.  The implicit use of temporary tables might be a surprise
> too, I suppose.

Yeah, we can sure use LOB locators if that's more efficient. Thanks for
the pointers.

I was actually wondering about the need to pass the name of the table
and the name of column to p[b]query/select, which was the bit that
wasn't clear to me. I just can't see the need for that, after my brief
reading of OCI documentation. Which would then enable us to use
apr_datum_t for passing binary BLOB/CLOB data into pbquery/select,
instead of inventing yet another structure and a set of functions to
deal with it (as it would be opaque).

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Mon, 2007-01-15 at 23:58 +0100, Antonio Alvarado Hernández wrote:

>  - Binding by position is same as your example bellow.

In APU DBD, we always bind by position, so we should be good.

> If you binding as at point (1). above, you are limited to 4GB of
> length (not 128 terabytes ;-)

He, he... I think we'll be OK with this. For a while ;-)

> (3). apr_dbd_lob_t can be used for real LOB piecewise manipulation.  I
> don't know if this feature is needed by other drivers (e.g. MySQL,
> Postgres, and so on)

We could use it (actually, I have patches that do so :-), but it is not
mandatory. Other structures we already have in APU may do just fine
(e.g. apr_datum_t). I just wanted to stay away from introducing
something that is not entirely required. KISS and all.

> I'm not an expert in OCI nor APR-DBD but I feel this maybe my 2 cents :-)

Thanks. Any info on OCI is welcome. To me, OCI is still a bit of a
mystery :-)

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Antonio Alvarado Hernández <aa...@gmail.com>.

Hello, I'm missing something... these are my thoughs:

On 1/15/07, Bojan Smojver <bo...@rexursive.com> wrote:
>
> Which brings me back to the once mentioned Oracle weirdness, where the
> name of the table and the name of the column are supposedly required
> in order to bind a BLOB/CLOB as INPUT parameters (i.e. to
> p[b]query/select). From what I can see in the docs (Oracle Call
> Interface Programmer's Guide), simple things like this are allowed in
> order to bind LOBs:

(1). I agreed.

 - Binding by name refers to parameter names (e.g. "p1", "p2" and "p3"
in "INSERT INTO lob_long_tab (C1, C2, L) VALUES (:p1, :p2, :p3)"), not
table/columns name.

 - Binding by position is same as your example bellow.

>
> ----------------------------------------------------------
> CREATE TABLE foo (a INTEGER );
> CREATE TYPE lob_typ AS OBJECT (A1 CLOB );
> CREATE TABLE lob_long_tab (C1 CLOB, C2 CLOB, CT3 lob_typ, L LONG);
>
> void insert()                 /* A function in an OCI program */
> {
>     /* The following is allowed */
>     ub1 buffer[8000];
>     text *insert_sql = (text *) "INSERT INTO lob_long_tab (C1, C2, L) \
>                         VALUES (:1, :2, :3)";
>     OCIStmtPrepare(stmthp, errhp, insert_sql, strlen((char*)insert_sql),
>                   (ub4) OCI_NTV_SYNTAX, (ub4) OCI_DEFAULT);
>     OCIBindByPos(stmthp, &bindhp[0], errhp, 1, (dvoid *)buffer, 8000,
>                  SQLT_LNG, 0, 0, 0, 0, 0, (ub4) OCI_DEFAULT);
>     OCIBindByPos(stmthp, &bindhp[1], errhp, 2, (dvoid *)buffer, 8000,
>                  SQLT_LNG, 0, 0, 0, 0, 0, (ub4) OCI_DEFAULT);
>     OCIBindByPos(stmthp, &bindhp[2], errhp, 3, (dvoid *)buffer, 2000,
>                  SQLT_LNG, 0, 0, 0, 0, 0, (ub4) OCI_DEFAULT);
>     OCIStmtExecute(svchp, stmthp, errhp, 1, 0, (OCISnapshot *) NULL,
>                                (OCISnapshot *) NULL, OCI_DEFAULT);
> }
> ----------------------------------------------------------
>
> So, one doesn't even need the LOB descriptor or anything, let alone
> the name of the table or column. Did I miss something important here?
> It looks pretty straightforward to me...

(2). Yes.  Lob descriptor is for streaming LOB manipulation as used
with OCILobRead2 and OCILobWriter2 (Oracle call this piecewise fetch
or update).  More at
http://download-uk.oracle.com/docs/cd/B19306_01/appdev.102/b14250/oci07lob.htm#sthref1158

If you binding as at point (1). above, you are limited to 4GB of
length (not 128 terabytes ;-)

> If you remember, that was the reason for creating patches that would
> encode (in CHARACTER (i.e. existing) mode) BLOB/CLOB values as:
>
> length:table:column:payload
>
>  From what I can see, we could get away with just length:payload, just
> like we can with all other databases we support.
>
> In binary mode, we wouldn't have to specify table/column at all.
> Again, size and payload should be just fine. This would enable us to
> use some existing structures for all this, instead of having to invent
> apr_dbd_blob_t.

(3). apr_dbd_lob_t can be used for real LOB piecewise manipulation.  I
don't know if this feature is needed by other drivers (e.g. MySQL,
Postgres, and so on)

> Anyone familiar with OCI, please speak up. Chris?
>
> --
> Bojan
>

I'm not an expert in OCI nor APR-DBD but I feel this maybe my 2 cents :-)

Regards,
Antonio

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

Quoting Bojan Smojver <bo...@rexursive.com>:

> I was silly enough to download Oracle XE and start playing with the
> Oracle driver...

And the result of that madness is attached :-)

You'll notice that I didn't use LOB locators to do binding in the  
Oracle driver, as this would require unusual code gymnastics, as  
previously discussed. Also, some things are cheap and inefficient code  
reuse cases. But, it may be some kind of start...

PS. The whole thing is in the tarball due to size. You'll find there  
(crude) bits that I used for testing.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

Quoting Bojan Smojver <bo...@rexursive.com>:

> This would enable us to use some
> existing structures for all this, instead of having to invent
> apr_dbd_blob_t.

For instance, apr_datum_t.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Chris Darroch <ch...@pearsoncmg.com>.

Bojan Smojver wrote:

>     OCIBindByPos(stmthp, &bindhp[0], errhp, 1, (dvoid *)buffer, 8000,
>                  SQLT_LNG, 0, 0, 0, 0, 0, (ub4) OCI_DEFAULT);

> So, one doesn't even need the LOB descriptor or anything, let alone  
> the name of the table or column. Did I miss something important here?  

   Well, for anything to do with OCI, I always have to go back to
basics and try to refresh my memory about all the complexities,
so my apologies if this turns out to be incorrect.

   On a quick glance, I think what they're showing here is essentially
what we do now to handle LOBs, and that's treat them like LONGs
(SQLT_LNG means SQL Type LONG) when the '%L' placeholder is used.
The current apr_dbd_oracle.c then uses strlen() to determine the length
of the input argument (in the example above that's the 8000 value).
Here's a relevant bit of apr_dbd_oracle.c:

    case APR_DBD_ORACLE_LOB:
        /* requires strlen() over large data, which may fail for binary */
        statement->args[i].value.raw = va_arg(args, char*);
        statement->args[i].len =
            strlen(statement->args[i].value.stringval); 
        sql->status = OCIBindByPos(statement->stmt,
                                   &statement->args[i].bind,
                                   sql->err, i+1,
                                   (void*)statement->args[i].value.raw,
                                   statement->args[i].len, SQLT_LNG,
                                   &statement->args[i].ind,
                                   NULL,
                                   (ub2) 0, (ub4) 0,
                                   (ub4 *) 0, OCI_DEFAULT);

   For binary BLOB data, that strlen() is going to be a problem.
At the very least, a way of knowing the length of binary data is
required.  Beyond that, it looks like Oracle documents a number of
limitations on this approach:

http://www.stanford.edu/dept/itss/docs/oracle/10g/appdev.101/b10779/oci05bnd.htm#428395

Although it's not mentioned, my own guess is that there are performance
reasons for using LOB locators as well, but that's just a guess.
The issues I suspect would trip users up would be the ones that sometimes
impose a 4 KB limit on the data, depending on the particulars of the
SQL statement.  The implicit use of temporary tables might be a surprise
too, I suppose.

   Like I said, that's a quick response based on a skim of the docs;
apologies in advance for any errors or misinformation.

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

Quoting Chris Darroch <ch...@pearsoncmg.com>:

> 2) Alas, for Oracle, there should probably also be the types
>    APR_DBD_TYPE_CLOB

I was silly enough to download Oracle XE and start playing with the  
Oracle driver...

Which brings me back to the once mentioned Oracle weirdness, where the  
name of the table and the name of the column are supposedly required  
in order to bind a BLOB/CLOB as INPUT parameters (i.e. to  
p[b]query/select). From what I can see in the docs (Oracle Call  
Interface Programmer's Guide), simple things like this are allowed in  
order to bind LOBs:

----------------------------------------------------------
CREATE TABLE foo (a INTEGER );
CREATE TYPE lob_typ AS OBJECT (A1 CLOB );
CREATE TABLE lob_long_tab (C1 CLOB, C2 CLOB, CT3 lob_typ, L LONG);

void insert()                 /* A function in an OCI program */
{
    /* The following is allowed */
    ub1 buffer[8000];
    text *insert_sql = (text *) "INSERT INTO lob_long_tab (C1, C2, L) \
                        VALUES (:1, :2, :3)";
    OCIStmtPrepare(stmthp, errhp, insert_sql, strlen((char*)insert_sql),
                  (ub4) OCI_NTV_SYNTAX, (ub4) OCI_DEFAULT);
    OCIBindByPos(stmthp, &bindhp[0], errhp, 1, (dvoid *)buffer, 8000,
                 SQLT_LNG, 0, 0, 0, 0, 0, (ub4) OCI_DEFAULT);
    OCIBindByPos(stmthp, &bindhp[1], errhp, 2, (dvoid *)buffer, 8000,
                 SQLT_LNG, 0, 0, 0, 0, 0, (ub4) OCI_DEFAULT);
    OCIBindByPos(stmthp, &bindhp[2], errhp, 3, (dvoid *)buffer, 2000,
                 SQLT_LNG, 0, 0, 0, 0, 0, (ub4) OCI_DEFAULT);
    OCIStmtExecute(svchp, stmthp, errhp, 1, 0, (OCISnapshot *) NULL,
                               (OCISnapshot *) NULL, OCI_DEFAULT);
}
----------------------------------------------------------

So, one doesn't even need the LOB descriptor or anything, let alone  
the name of the table or column. Did I miss something important here?  
It looks pretty straightforward to me...

If you remember, that was the reason for creating patches that would  
encode (in CHARACTER (i.e. existing) mode) BLOB/CLOB values as:

length:table:column:payload

 From what I can see, we could get away with just length:payload, just  
like we can with all other databases we support.

In binary mode, we wouldn't have to specify table/column at all.  
Again, size and payload should be just fine. This would enable us to  
use some existing structures for all this, instead of having to invent  
apr_dbd_blob_t.

Anyone familiar with OCI, please speak up. Chris?

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

Quoting Bojan Smojver <bo...@rexursive.com>:

> In the patches, I return APR_ENOENT from datum_get() if I find a null
> value, so no need for apr_dbd_is_null() call. Except that I've forgotten
> to do this for SQLite2... Sorry, I'll try to fix that.

Here is another tarball with the patches that hopefully fixes this for  
SQLite2.

I included a new version of the test program that shows what I meant  
with this and exercises databases that don't have support for prepared  
statements. I included the build script for the test program too, just  
in case.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Alex Dubov <oa...@yahoo.com>.

I was not following the topic lately, so excuse me if I'm out of sync.

> Which brings me to my reasoning why apr_dbd_blob_t doesn't contain a  
> brigade (one of the proposals that was floating on the list), but  
> rather a flat data pointer and size (in case you were wondering). The  
> caller is in a much better position to control memory allocation  
> issues here and leaving the "flattening" of a brigade to DBD functions  
> can bring about all kinds of unwanted memory allocation effects. In  
> the end, most databases deal with a flat chunk of memory and that's  
> what callers mostly have anyway.

It also was my idea on going this way some time ago, but now I see the light. I actually believe
that there is a considerable advantage in using buckets (not brigades) when passing binary data to
and from the database. And heap buckets behave in a way user will expect from *_blob_t or any
similar structure.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

Quoting Bojan Smojver <bo...@rexursive.com>:

> We could have others - for instance apr_dbd_file_t (for
> APR_DBD_TYPE_BFILE), which could look something like this:
>
> struct apr_dbd_file {
>     apr_file_t *file;   /**< file to read the data from */
>     apr_size_t size;    /**< size to read */
>     apr_off_t  offset;  /**< offset to read from */
>     const char *table;  /**< table name (used for Oracle) */
>     const char *column; /**< column name (used for Oracle) */
> };

Actually, please disregard this part of my reply. I'll blame it on  
early morning, although we all know that's not true ;-)

Seriously, for something like this to be useful, databases would need  
to know how to take an open file descriptor, offset and size and use  
that as input to execPrepared (and similar). These functions usually  
expect a pointer to a chunk of memory and size or nul terminated  
string, so the above probably wouldn't be useful at all.

Which brings me to my reasoning why apr_dbd_blob_t doesn't contain a  
brigade (one of the proposals that was floating on the list), but  
rather a flat data pointer and size (in case you were wondering). The  
caller is in a much better position to control memory allocation  
issues here and leaving the "flattening" of a brigade to DBD functions  
can bring about all kinds of unwanted memory allocation effects. In  
the end, most databases deal with a flat chunk of memory and that's  
what callers mostly have anyway.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

Quoting Bojan Smojver <bo...@rexursive.com>:

> And here is the patch that does exactly that. I'll follow this up with
> another e-mail where I'll include some other, related stuff. This is
> due to list size restrictions.

As promised...

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

Quoting Bojan Smojver <bo...@rexursive.com>:

> However, we can always
> go opaque and make sure nobody relies on this structure being certain
> way.

And here is the patch that does exactly that. I'll follow this up with  
another e-mail where I'll include some other, related stuff. This is  
due to list size restrictions.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Mon, 2006-09-25 at 11:28 -0700, Chris Darroch wrote:

>    I guess my remaining hestitation relates to the fact that there's
> a lot of reliance on atoi() and friends here, and that with certain
> drivers you might see numeric data converted from the DB's internal
> format to a string (because the driver asks for string data values),
> using the DB's internal conversion mechanism, and then converted
> back to a numeric value by atoi(), apr_atoi64(), etc. because the
> caller asked the driver for a specific numeric format.  I'm not a
> floating-point expert, I confess ... could there be subtle data
> conversion problems lurking in there?
> 
>    OTOH, if all the drivers work the same way and all use atoi()
> and friends, then any variations between drivers in extracting some
> known floating-point value from the DBs would have to be caused
> by the DBs' internal conversions to strings ... does that let APR
> off the hook?  I just don't know, myself.

The proper thing to do would be to get a float/int/double etc. directly
from the SQL backend if at all possible to minimise conversion problems.
We can always work on that later, as it is an implementation detail and
it shouldn't affect how API works.

Obviously, some of the functions would have to be changed internally to
facilitate the fact that we'd have all kinds of different data fetched
from the backend. I just didn't want to complicate things too much with
the first iteration. Also, present code does strings only, so we should
not be worse off by doing conversions, as we have to do them now after
calling get_entry().

In fact, since I didn't do any work with Oracle driver, you can
implement that approach there from the word go :-).

>    I'm not really sure what's best here.  Because these structures
> for LOB data will necessarily vary across drivers, my hestitation
> remains about putting the structure definition right in apr_dbd.h.
> I *think* that appending fields to structures might be permitted across
> minor version changes, without breaking the ABI, but I'm not 100% sure.
> That might suffice to handle any new fields required by support for
> additional data types and/or databases in the future.  But I'd feel
> better if an API/ABI rules guru weighed in on that.  :-)

Yeah, it would be good if someone else chimed in. However, we can always
go opaque and make sure nobody relies on this structure being certain
way. Then we'd have:

apr_dbd_blob_data_set/get()
apr_dbd_blob_length_set/get()
apr_dbd_blob_table_set/get()
apr_dbd_blob_column_set/get()
...

And our structure would always be "compatible". This is not a
complicated thing to implement at all and if that's preferred, I can
work on it.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Chris Darroch <ch...@pearsoncmg.com>.

Bojan Smojver wrote:

> On Tue, 2006-09-05 at 23:39 -0700, Chris Darroch wrote:
[snip]
>>    Now, I may very well be missing something here, but this seems
>>    backwards to me.  I would have thought that you'd want to pass
>>    an apr_dbd_type_e *type parameter instead, and then datum_get()
>>    would write into that location the type of the column, and
>>    write into the void *data parameter the actual data, formatted
>>    as one would expect for the returned type.  So if the column
>>    contains integers, you'd get a type of APR_DBD_TYPE_INT and
>>    a pointer to a signed int, and so forth.
> 
> I was thinking about doing it this way as well and maybe it's the right
> way - not sure. Here is why I chose to do it the way I did:
> 
> In majority (or all?) of my own database related work, I would know the
> column type ahead of getting the data, because I was usually the one
> that created the table, had prior knowledge of its structure or could
> find out in some SQL way. So, I was thinking, why wait until after the
> call to figure out what the column type is (and have another if/switch
> construct after datum_get()) - let's specify it upfront.
> 
> The next thing that felt good about this approach is that the storage is
> entirely managed by the caller. So, I already have my ints, shorts,
> longs, brigades and other data types allocated - then I call datum_get()
> with pointers to the allocated data, and I get everything populated and
> ready to be used (minus the "is it null?" bit - see below).
> 
> And, it also gives us the opportunity to convert various data on the fly
> with datum_get(), without any extra programming effort on caller's
> behalf (to a degree, of course).

   OK, I see where you're coming from ... I see the value of having
the caller request the data format, and letting any problems be on
their head (if they ask for a integer and the string value from the
DB contains "foo", well, that's their problem).

   I guess my remaining hestitation relates to the fact that there's
a lot of reliance on atoi() and friends here, and that with certain
drivers you might see numeric data converted from the DB's internal
format to a string (because the driver asks for string data values),
using the DB's internal conversion mechanism, and then converted
back to a numeric value by atoi(), apr_atoi64(), etc. because the
caller asked the driver for a specific numeric format.  I'm not a
floating-point expert, I confess ... could there be subtle data
conversion problems lurking in there?

   OTOH, if all the drivers work the same way and all use atoi()
and friends, then any variations between drivers in extracting some
known floating-point value from the DBs would have to be caused
by the DBs' internal conversions to strings ... does that let APR
off the hook?  I just don't know, myself.

> In the patches, I return APR_ENOENT from datum_get() if I find a null
> value, so no need for apr_dbd_is_null() call. Except that I've forgotten
> to do this for SQLite2... Sorry, I'll try to fix that.

   Gotcha, missed that, sorry!

> In other words, in _prepare, we get in %p<something>, which means CLOB,
> LOB, FILE or any other additional types that Oracle (and other DBs) may
> support.
> 
> Then, in the _p[b]query/select (since we know the type from _prepare),
> we get ourselves just a regular apr_dbd_blob_t, but we know to call the
> correct bind functions and use the correct parts of it (i.e. length, or
> table name, table name etc.). The apr_dbd_blob_t is just a data delivery
> mechanism. We could have others - for instance apr_dbd_file_t (for
> APR_DBD_TYPE_BFILE) [...]

   I've never used Oracle BFILEs, but I think you would actually need
to pass a directory "alias" (a string) and the filename; these would
then go to the OCILobFileSetName() function.

   I'm not really sure what's best here.  Because these structures
for LOB data will necessarily vary across drivers, my hestitation
remains about putting the structure definition right in apr_dbd.h.
I *think* that appending fields to structures might be permitted across
minor version changes, without breaking the ABI, but I'm not 100% sure.
That might suffice to handle any new fields required by support for
additional data types and/or databases in the future.  But I'd feel
better if an API/ABI rules guru weighed in on that.  :-)

   At any rate, thanks so much for the work on this binary arguments
patch!  I'm sorry to be stuck in another project for work and to have
so little time right now to devote to this.  Thanks again!

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Tue, 2006-09-05 at 23:39 -0700, Chris Darroch wrote:

>    Firstly, Bojan, thank you for all the work on these patches!  I read
> through them last week and reviewed quickly your latest updates, and
> really, I don't see much to discuss -- it's all looking very close to
> what I might hope for!

No worries at all. Thanks for reviewing!

>    Here are my few thoughts so far.  They may not be completely cogent.
> 
> 1) The one thing that stood out for me was that apr_dbd_datum_get()
>    seems to take an apr_dbd_type_e parameter.  The drivers seem to
>    then format the result data from the DB into the requested type.
>    This means that you can ask for an APR_DBD_TYPE_NULL, for example,
>    regardless of what the data is.
> 
>    Now, I may very well be missing something here, but this seems
>    backwards to me.  I would have thought that you'd want to pass
>    an apr_dbd_type_e *type parameter instead, and then datum_get()
>    would write into that location the type of the column, and
>    write into the void *data parameter the actual data, formatted
>    as one would expect for the returned type.  So if the column
>    contains integers, you'd get a type of APR_DBD_TYPE_INT and
>    a pointer to a signed int, and so forth.

I was thinking about doing it this way as well and maybe it's the right
way - not sure. Here is why I chose to do it the way I did:

In majority (or all?) of my own database related work, I would know the
column type ahead of getting the data, because I was usually the one
that created the table, had prior knowledge of its structure or could
find out in some SQL way. So, I was thinking, why wait until after the
call to figure out what the column type is (and have another if/switch
construct after datum_get()) - let's specify it upfront.

The next thing that felt good about this approach is that the storage is
entirely managed by the caller. So, I already have my ints, shorts,
longs, brigades and other data types allocated - then I call datum_get()
with pointers to the allocated data, and I get everything populated and
ready to be used (minus the "is it null?" bit - see below).

And, it also gives us the opportunity to convert various data on the fly
with datum_get(), without any extra programming effort on caller's
behalf (to a degree, of course).

>    Further, if the data in that row for that column happens to be a NULL,
>    you'd get an APR_DBD_TYPE_NULL type and a NULL pointer.  That way,
>    you don't need to use the proposed apr_dbd_is_null() function unless
>    you're using the string-only API (i.e., the non-binary-data functions).

In the patches, I return APR_ENOENT from datum_get() if I find a null
value, so no need for apr_dbd_is_null() call. Except that I've forgotten
to do this for SQLite2... Sorry, I'll try to fix that.

> 2) Alas, for Oracle, there should probably also be the types
>    APR_DBD_TYPE_CLOB and APR_DBD_TYPE_BFILE, but I can add those
>    if I ever get around to implementing the Oracle version of this!  :-(

If that's what we need to do, sure, why not. You're the Oracle guy! :-)

> 3) Defining apr_dbd_blob in apr_dbd.h means it's going to wind up
>    fixed across all drivers.  Oracle would need an apr_dbd_clob too,
>    or at least a common apr_dbd_lob (since the type would be specified
>    elsewhere, but the data, length, table, and column need to be known
>    at bind time for all LOBs in Oracle, IIRC).
> 
>    Do you see any way an apr_dbd_lob could be made private to the driver,
>    and the required data fields in it queried from the driver by
>    a kind of reflection method?  At one point, I proposed something
>    like this, but I'll modify it a bit here:
> 
> typedef enum {
>     APR_DBD_FLAGS_TABLE_NAME,
>     APR_DBD_FLAGS_COLUMN_NAME
> } apr_dbd_flags_e;
> 
> APU_DECLARE(apr_dbd_flags_e)
>     apr_dbd_type_flags_get(const apr_dbd_driver_t *driver,
>                            apr_dbd_type_e type);
> 
>    I'm thinking here that if you knew you were passing in a particular
> type of bound argument after an apr_dbd_prepare(), you could query to
> find out what fields you need to pack, either in the old-style string
> argument or in the new-style binary structure.  Or it could be called
> apr_dbd_lob_flags_get() and return an apr_dbd_lob_flags_e value:
> that would clarify its usage at the expense of using the method for
> any future kind of non-LOB data type.

Yeah, I see that this would be a lot more generic (i.e. you'd
dynamically figure out what needs to be put in - similar to figuring out
the data types in datum_get()). I was thinking along the lines of
keeping it relatively simple, by having a cover-all structure (which, of
course, is a recipe for breaking binary compatibility when we need more
fields :-).

In other words, in _prepare, we get in %p<something>, which means CLOB,
LOB, FILE or any other additional types that Oracle (and other DBs) may
support.

Then, in the _p[b]query/select (since we know the type from _prepare),
we get ourselves just a regular apr_dbd_blob_t, but we know to call the
correct bind functions and use the correct parts of it (i.e. length, or
table name, table name etc.). The apr_dbd_blob_t is just a data delivery
mechanism. We could have others - for instance apr_dbd_file_t (for
APR_DBD_TYPE_BFILE), which could look something like this:

struct apr_dbd_file {
    apr_file_t *file;   /**< file to read the data from */
    apr_size_t size;    /**< size to read */
    apr_off_t  offset;  /**< offset to read from */
    const char *table;  /**< table name (used for Oracle) */
    const char *column; /**< column name (used for Oracle) */
};

Again, thanks for reviewing and I hope you find my answer useful.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Chris Darroch <ch...@pearsoncmg.com>.

Bojan Smojver wrote:

> Yet another drop of the patches and the test program. Be careful with  
> these, they may actually return some data to you :-)

   Firstly, Bojan, thank you for all the work on these patches!  I read
through them last week and reviewed quickly your latest updates, and
really, I don't see much to discuss -- it's all looking very close to
what I might hope for!

   Here are my few thoughts so far.  They may not be completely cogent.

1) The one thing that stood out for me was that apr_dbd_datum_get()
   seems to take an apr_dbd_type_e parameter.  The drivers seem to
   then format the result data from the DB into the requested type.
   This means that you can ask for an APR_DBD_TYPE_NULL, for example,
   regardless of what the data is.

   Now, I may very well be missing something here, but this seems
   backwards to me.  I would have thought that you'd want to pass
   an apr_dbd_type_e *type parameter instead, and then datum_get()
   would write into that location the type of the column, and
   write into the void *data parameter the actual data, formatted
   as one would expect for the returned type.  So if the column
   contains integers, you'd get a type of APR_DBD_TYPE_INT and
   a pointer to a signed int, and so forth.

   Further, if the data in that row for that column happens to be a NULL,
   you'd get an APR_DBD_TYPE_NULL type and a NULL pointer.  That way,
   you don't need to use the proposed apr_dbd_is_null() function unless
   you're using the string-only API (i.e., the non-binary-data functions).

   Does that make any sense?  I would have thought the DB would tell
   you the type, not the caller, is what it boils down to for me.
   But, I could well be off base here.

2) Alas, for Oracle, there should probably also be the types
   APR_DBD_TYPE_CLOB and APR_DBD_TYPE_BFILE, but I can add those
   if I ever get around to implementing the Oracle version of this!  :-(

3) Defining apr_dbd_blob in apr_dbd.h means it's going to wind up
   fixed across all drivers.  Oracle would need an apr_dbd_clob too,
   or at least a common apr_dbd_lob (since the type would be specified
   elsewhere, but the data, length, table, and column need to be known
   at bind time for all LOBs in Oracle, IIRC).

   Do you see any way an apr_dbd_lob could be made private to the driver,
   and the required data fields in it queried from the driver by
   a kind of reflection method?  At one point, I proposed something
   like this, but I'll modify it a bit here:

typedef enum {
    APR_DBD_FLAGS_TABLE_NAME,
    APR_DBD_FLAGS_COLUMN_NAME
} apr_dbd_flags_e;

APU_DECLARE(apr_dbd_flags_e)
    apr_dbd_type_flags_get(const apr_dbd_driver_t *driver,
                           apr_dbd_type_e type);

   I'm thinking here that if you knew you were passing in a particular
type of bound argument after an apr_dbd_prepare(), you could query to
find out what fields you need to pack, either in the old-style string
argument or in the new-style binary structure.  Or it could be called
apr_dbd_lob_flags_get() and return an apr_dbd_lob_flags_e value:
that would clarify its usage at the expense of using the method for
any future kind of non-LOB data type.

   OK, I'm off to sleep now ... sorry if any or all of that seems
way off the mark.  Thanks again!

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

Quoting Bojan Smojver <bo...@rexursive.com>:

> Oh, and here is a new version of the patches

Yet another drop of the patches and the test program. Be careful with  
these, they may actually return some data to you :-)

You will notice that I tried to get the MySQL version of driver in  
line with the rest of the drivers in terms of persistance of what gets  
returned from get_entry() and datum_get(). I may have even gone  
overboard with copying, but chose to to err on the side of caution for  
now. The bucket brigade "streaming" of BLOBs in MySQL should  
(hopefully) actually do something now ;-)

PS. Again, a tarball due to size.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

Quoting Bojan Smojver <bo...@rexursive.com>:

> Awesome, thanks.

Oh, and here is a new version of the patches that fixes some of the  
bugs that I encountered when running the incomplete test program  
(inside the attached tarball). Hopefully, it'll eliminate some time  
wasting...

PS. The whole thing is in a tarball because it would be too big to  
mail to the list as plain attachments.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Bojan Smojver <bo...@rexursive.com>.

On Fri, 2006-08-25 at 11:05 -0700, Chris Darroch wrote:

> I'm going to have some travel
> next week so I'll take it with me to review on the plane trips.

Awesome, thanks.

-- 
Bojan

Re: DBD: Prepared statements, BLOBs etc.

Posted by Chris Darroch <ch...@pearsoncmg.com>.

Bojan Smojver wrote:

> Some time ago we had quite a few e-mails exchanged about DBD's prepared
> statement support and various ways of getting binary data in and out of
> prepared statements. So, I prepared these two patches to outline with
> code what I had in mind. You may find some of your suggestions in these
> patches - this is not by accident.
> 
> Before you attempt to compile and run this, let me put it bluntly: these
> patches DON'T WORK! They have not been tested at all - they are here
> just to express some of the ideas in code. The code in there may (and
> most likely does) have serious flaws, even to the point where you could
> be scratching you head thinking "what did he want with this". Hopefully,
> main ideas will remain visible.
> 
> What is in the patches? This (roughly):
> 
> - parsing of SQL to find parameters has been moved out of the drivers
> and into the upper layer, so it should be identical for all supported
> databases
> 
> - new functions apr_dbd_pb[v]query/select() have been introduced, which
> take "binary" (void *) arguments instead of const char *
> 
> - new function apr_dbd_datum_get() has been introduced to fetch binary
> data from prepared queries
> 
> - structure apr_dbd_blob_t has been defined to pass BLOBs into the query
> 
> - SQL to C types have been enumerated
> 
> - some situations of in-driver duplication of code have been avoided
> (this is mostly as a consequence of new functions being introduced)
> 
> - "streaming" of BLOB data using a bucket brigade from MySQL (now there
> is some code that *definitely* won't work :-)
> 
> - the code should be binary compatible with 1.2.x (i.e. no need to go to
> APR-Util 2.x to have this)
> 
> - lots and lots of new bugs ;-)
> 
> You will find that I haven't done any work on the Oracle driver. It is
> simply because I really have no idea about the API. But, I tried to
> incorporate some of the concerns raised into the patches (i.e. in terms
> of BLOBs). I guess SQLite3, PGSQL and MySQL stuff should be sufficient
> enough to represent the ideas.
> 
> Criticise away - this is what the patches are for!

   Wow, this is quite a lot of work!  I'm going to have some travel
next week so I'll take it with me to review on the plane trips.  I'll
have to remind myself -- again -- what Oracle's various weirdnesses
are with regard to LOBs.  :-)

   Thanks!

Chris.

-- 
GPG Key ID: 366A375B
GPG Key Fingerprint: 485E 5041 17E1 E2BB C263  E4DE C8E3 FA36 366A 375B