You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Frank Griffin <ft...@roadrunner.com> on 2008/02/29 00:53:09 UTC

SQL/MED

To recap...

A while back, I asked about accessing foreign data from Derby.  Rick
very helpfully pointed me to Table Functions, and I've created a
prototype using Table Functions which accesses my foreign engine.  This
all works very nicely, within the design capabilities of Table Functions.

I had pointed out that without some way to push predicates, projection
and selection would be crippled, and lookup-type activities on large
result sets would be pretty inefficient.  Rick responded that for that
type of access to work well, what I probably wanted was SQL/MED.

So I've gotten the SQL/MED spec and poked through it a bit.  It looks
like exactly what I'd like.

Has there been any discussion of implementing SQL/MED in Derby ?  If so,
was there any resolution ?  From what I can see googling around, I can't
find any indication that any SQL engine actually provides a reference
implementation of SQL/MED, in spite of the fact that it's been in the
spec since 1999 and was enhanced in 2003.

I would be  interested in working on such an implementation in Derby,
assuming that the idea hasn't already been discussed and discarded.

>From what I can tell, this would require DDL syntax pretty much like
what Table Functions required.  CREATE FOREIGN TABLE would be very
similar to CREATE FUNCTION (TABLE) and the others are pretty simple (as
they don't require column definitions, and pretty much just specify
software and connection linkages).

It would also require walking the SQL parse tree and creating the Value
Expressions which describe the SELECT elements, the WHERE clause
components, and so forth. 

Derby would be required to initialize the MED wrapper and pass a Request
Handle related to the actual query Derby would like the foreign wrapper
to execute.  The wrapper uses this Handle to query Derby about the
individual components of the Request, and accepts those that it can
handle.  Derby then makes its own arrangements to handle any screening
that the wrapper can't handle, and generates an execution plan which
involves calling the wrapper to do the parts of which it has declared
itself capable.

Beyond that, the support is very much like what Table Functions already
supports.  Derby activates the individual query components handled by
the wrapper, and navigates among the rows using a model pretty much like
the JDBC ResultSet model used by Table Functions.

Any interest in this, or am I beating a previously-deceased horse ?



Re: SQL/MED

Posted by Rick Hillegas <Ri...@Sun.COM>.
Bryan Pendleton wrote:
>>>> So I've gotten the SQL/MED spec and poked through it a bit.  It looks
>>>> like exactly what I'd like.
>
> What is SQL/MED? Is there a web site with background material, etc?
>
> thanks,
>
> bryan
>
Hi Bryan,

SQL/MED seems to be an ANSI approach to federated databases. It's 
described in part 9 of the SQL standard. If you google up "sql med", the 
first result is a pdf document describing SQL/MED, written by Jim 
Melton. That may or may not be helpful.

Regards,
-Rick

Re: SQL/MED

Posted by Bryan Pendleton <bp...@amberpoint.com>.
>>> So I've gotten the SQL/MED spec and poked through it a bit.  It looks
>>> like exactly what I'd like.

What is SQL/MED? Is there a web site with background material, etc?

thanks,

bryan


Re: SQL/MED

Posted by Rick Hillegas <Ri...@Sun.COM>.
Hi Frank,

One other quick piece of advice: The devil is going to be in the details 
here. It's probably a good idea to list the (families of) queries you 
want to handle. Then people can help you understand whether your plan 
will really result in a performance boost with incremental modifications 
to Derby and without SQL/MED support on the remote databases.

Hope this helps,
-Rick

Rick Hillegas wrote:
> Hi Frank,
>
> I'm not aware of any discussion of this topic on derby-dev. I have 
> used nabble (http://www.nabble.com/Apache-Database-f105.html) to 
> search the email archive for "sql med" and nothing came up. Your 
> proposal sounds like a good basis for a first implementation. Here are 
> a couple more thoughts:
>
> 1) This could grow arbitrarily complicated, depending on how much 
> heavy lifting you want the optimizer to do for you. With a little 
> luck, you could probably trick the optimizer into treating the foreign 
> tables like Table Functions. That would skirt some of the interesting 
> optimizer issues.
>
> 2) A later implementation could get the optimizer involved and you 
> could build some interesting cost model for remote tables.
>
> 3) Alternatively, you might be able to get an optimizer expert like 
> Army involved early on. He might be able to help you divide and 
> conquer this one.
>
> You might want to start a wiki page on this topic and take it from there.
>
> Regards,
> -Rick
>
> Frank Griffin wrote:
>> To recap...
>>
>> A while back, I asked about accessing foreign data from Derby.  Rick
>> very helpfully pointed me to Table Functions, and I've created a
>> prototype using Table Functions which accesses my foreign engine.  This
>> all works very nicely, within the design capabilities of Table 
>> Functions.
>>
>> I had pointed out that without some way to push predicates, projection
>> and selection would be crippled, and lookup-type activities on large
>> result sets would be pretty inefficient.  Rick responded that for that
>> type of access to work well, what I probably wanted was SQL/MED.
>>
>> So I've gotten the SQL/MED spec and poked through it a bit.  It looks
>> like exactly what I'd like.
>>
>> Has there been any discussion of implementing SQL/MED in Derby ?  If so,
>> was there any resolution ?  From what I can see googling around, I can't
>> find any indication that any SQL engine actually provides a reference
>> implementation of SQL/MED, in spite of the fact that it's been in the
>> spec since 1999 and was enhanced in 2003.
>>
>> I would be  interested in working on such an implementation in Derby,
>> assuming that the idea hasn't already been discussed and discarded.
>>
>> From what I can tell, this would require DDL syntax pretty much like
>> what Table Functions required.  CREATE FOREIGN TABLE would be very
>> similar to CREATE FUNCTION (TABLE) and the others are pretty simple (as
>> they don't require column definitions, and pretty much just specify
>> software and connection linkages).
>>
>> It would also require walking the SQL parse tree and creating the Value
>> Expressions which describe the SELECT elements, the WHERE clause
>> components, and so forth.
>> Derby would be required to initialize the MED wrapper and pass a Request
>> Handle related to the actual query Derby would like the foreign wrapper
>> to execute.  The wrapper uses this Handle to query Derby about the
>> individual components of the Request, and accepts those that it can
>> handle.  Derby then makes its own arrangements to handle any screening
>> that the wrapper can't handle, and generates an execution plan which
>> involves calling the wrapper to do the parts of which it has declared
>> itself capable.
>>
>> Beyond that, the support is very much like what Table Functions already
>> supports.  Derby activates the individual query components handled by
>> the wrapper, and navigates among the rows using a model pretty much like
>> the JDBC ResultSet model used by Table Functions.
>>
>> Any interest in this, or am I beating a previously-deceased horse ?
>>
>>
>>   
>


Re: SQL/MED

Posted by Rick Hillegas <Ri...@Sun.COM>.
Hi Frank,

Some comments inline:

Frank Griffin wrote:
> Rick Hillegas wrote:
>   
>> 1) This could grow arbitrarily complicated, depending on how much
>> heavy lifting you want the optimizer to do for you. With a little
>> luck, you could probably trick the optimizer into treating the foreign
>> tables like Table Functions. That would skirt some of the interesting
>> optimizer issues.
>>
>> 2) A later implementation could get the optimizer involved and you
>> could build some interesting cost model for remote tables.
>>
>> 3) Alternatively, you might be able to get an optimizer expert like
>> Army involved early on. He might be able to help you divide and
>> conquer this one.
>>     
>
> The Derby and SQL/MED optimizer support look pretty similar.  SQL/MED
> has GetReplyFirstCost(), GetReplyExecCost(), and GetReplyReExecCost()
> interface functions that the wrapper (i.e. Table Function) supplies to
> the server.  IIRC Table Functions have the same concepts.
>
> For the initial pass, I sort of assumed that the ability to push
> predicates to a foreign server capable of handling them would minimize
> the effect of aggressive optimization within Derby, no ?
>
>
>   
>> You might want to start a wiki page on this topic and take it from there.
>>     
>
> Will do.  Is there a page template I should use ?  Or a project proposal
> page format used elsewhere ?
>   
People have been hanging their proposals here: 
http://wiki.apache.org/db-derby/DerbyProposals
>
> As a quick summary, here's the way I *think* SQL/MED works, based on
> some initial reading....
>
> In passthrough mode, the server (Derby) can send the SQL for the
> subquery directly to the wrapper; this obviously only works for foreign
> SQL servers.  In decomposition mode, the server parses the SQL, and
> separates out the subqueries directed at tables controlled by the
> foreign wrapper.  The server makes interface routines available to the
> wrapper which allow it to navigate the subquery parse tree in the
> SQL/MED format.
>
> There are almost no data structures involved in this.  It's all
> navigational interface calls which return individual bits of
> information.  That means that the SQL/MED parse model and the server
> parse model don't have to agree, as long as the server can supply the
> information requested by the interface routines.
>
> Conceptually, there is something called a Value Expression that
> corresponds to a syntax token (simple or complex) in the subquery.  If
> the token is complex, the VE has child VEs, and VEs get broken down to
> child VEs until only the four basic VE types --- COLUMN NAME, CONSTANT,
> OPERATOR, and PARAMETER --- are involved.  The SQL/MED model appears to
> assume that all of these VEs, whether for tokens in a SELECT list,
> expressions in a WHERE or ORDER BY, or whatever, are identified and
> strung out in linear fashion, sort of like an array, at least for
> Replies (see below).  The interface calls allow you to request the Nth
> VE of a certain type, e.g. the Nth item in the SELECT list or the Nth
> ORDER BY item.
>
> The server provides the wrapper with a Request Handle that represents a
> subquery.  The wrapper navigates through the subquery VEs and creates a
> Reply Handle which is conceptually a copy of the Request with the VEs
> the wrapper can't handle removed.  Just as the server provides
> navigational routines to the wrapper which allow it to walk the Request,
> the wrapper must provide a parallel set of navigational routines to the
> server to allow it to walk the Reply.  I'm not yet sure why, but the
> Request VEs are represented by "handles", while the Reply VEs are
> represented by ordinal numbers (see the array reference above);
> otherwise the set of navigational routines for Requests is pretty much a
> mirror image of the set for Replies.
>
> The server is responsible for detecting any differences between the
> Request and Reply, and changing its execution plan to pick up the
> wrapper's slack (so to speak). It then requests the wrapper to Open the
> various Replies involved, and Iterates through them, using concepts very
> similar to JDBC ResultSet, and incorporating the retrieved data into its
> execution plan very much like data supplied by a Table Function.
>
> I see this affecting several areas:
>
> 1) Derby DDL will need to be extended to include the various CREATE
> FOREIGN XXXX statements.  I'd need to be pointed to the changes that
> were made to add CREATE FUNCTION support for Table Functions.
>
> 2) Navigational routines need to be added which expose the Derby parse
> tree for subqueries to wrappers.  This should be new code, but I'd need
> to be pointed to code or docs for the parse tree model.
>
> 3) Derby support for incorporating wrappers into its execution plans
> *should* be quite similar to whatever was done for Table Functions.  The
> main difference will be new code to  analyze a Reply using the wrapper's
> navigational routines, compare it to the original Request, and remove
> any Derby predicate handling that was successfully pushed to the wrapper.
>
> 4) Data Access while iterating through a ResultSet is conceptually the
> same as JDBC, but the mechanics are not identical, since SQL/MED was
> designed to be language-neutral and Java isn't one of the languages
> specifically mentioned.  There will probably need to be new access code
> written to retrieve rows/columns from wrappers, and I'd need to know
> where to insert this.
>
> While I intend to make the architecture extensible to the full spec, I
> was thinking of limiting support in the first pass to SELECTs only.  I
> think the spec allows any sort of SQL to be passed to a wrapper, but I
> haven't gotten that far yet.
>
>   
Lots of good stuff. Limiting the scope of what you want to do and 
chunking the work into small increments are good instincts.

Regards,
-Rick


Re: SQL/MED

Posted by Frank Griffin <ft...@roadrunner.com>.
Rick Hillegas wrote:
> 1) This could grow arbitrarily complicated, depending on how much
> heavy lifting you want the optimizer to do for you. With a little
> luck, you could probably trick the optimizer into treating the foreign
> tables like Table Functions. That would skirt some of the interesting
> optimizer issues.
>
> 2) A later implementation could get the optimizer involved and you
> could build some interesting cost model for remote tables.
>
> 3) Alternatively, you might be able to get an optimizer expert like
> Army involved early on. He might be able to help you divide and
> conquer this one.

The Derby and SQL/MED optimizer support look pretty similar.  SQL/MED
has GetReplyFirstCost(), GetReplyExecCost(), and GetReplyReExecCost()
interface functions that the wrapper (i.e. Table Function) supplies to
the server.  IIRC Table Functions have the same concepts.

For the initial pass, I sort of assumed that the ability to push
predicates to a foreign server capable of handling them would minimize
the effect of aggressive optimization within Derby, no ?


>
> You might want to start a wiki page on this topic and take it from there.

Will do.  Is there a page template I should use ?  Or a project proposal
page format used elsewhere ?


As a quick summary, here's the way I *think* SQL/MED works, based on
some initial reading....

In passthrough mode, the server (Derby) can send the SQL for the
subquery directly to the wrapper; this obviously only works for foreign
SQL servers.  In decomposition mode, the server parses the SQL, and
separates out the subqueries directed at tables controlled by the
foreign wrapper.  The server makes interface routines available to the
wrapper which allow it to navigate the subquery parse tree in the
SQL/MED format.

There are almost no data structures involved in this.  It's all
navigational interface calls which return individual bits of
information.  That means that the SQL/MED parse model and the server
parse model don't have to agree, as long as the server can supply the
information requested by the interface routines.

Conceptually, there is something called a Value Expression that
corresponds to a syntax token (simple or complex) in the subquery.  If
the token is complex, the VE has child VEs, and VEs get broken down to
child VEs until only the four basic VE types --- COLUMN NAME, CONSTANT,
OPERATOR, and PARAMETER --- are involved.  The SQL/MED model appears to
assume that all of these VEs, whether for tokens in a SELECT list,
expressions in a WHERE or ORDER BY, or whatever, are identified and
strung out in linear fashion, sort of like an array, at least for
Replies (see below).  The interface calls allow you to request the Nth
VE of a certain type, e.g. the Nth item in the SELECT list or the Nth
ORDER BY item.

The server provides the wrapper with a Request Handle that represents a
subquery.  The wrapper navigates through the subquery VEs and creates a
Reply Handle which is conceptually a copy of the Request with the VEs
the wrapper can't handle removed.  Just as the server provides
navigational routines to the wrapper which allow it to walk the Request,
the wrapper must provide a parallel set of navigational routines to the
server to allow it to walk the Reply.  I'm not yet sure why, but the
Request VEs are represented by "handles", while the Reply VEs are
represented by ordinal numbers (see the array reference above);
otherwise the set of navigational routines for Requests is pretty much a
mirror image of the set for Replies.

The server is responsible for detecting any differences between the
Request and Reply, and changing its execution plan to pick up the
wrapper's slack (so to speak). It then requests the wrapper to Open the
various Replies involved, and Iterates through them, using concepts very
similar to JDBC ResultSet, and incorporating the retrieved data into its
execution plan very much like data supplied by a Table Function.

I see this affecting several areas:

1) Derby DDL will need to be extended to include the various CREATE
FOREIGN XXXX statements.  I'd need to be pointed to the changes that
were made to add CREATE FUNCTION support for Table Functions.

2) Navigational routines need to be added which expose the Derby parse
tree for subqueries to wrappers.  This should be new code, but I'd need
to be pointed to code or docs for the parse tree model.

3) Derby support for incorporating wrappers into its execution plans
*should* be quite similar to whatever was done for Table Functions.  The
main difference will be new code to  analyze a Reply using the wrapper's
navigational routines, compare it to the original Request, and remove
any Derby predicate handling that was successfully pushed to the wrapper.

4) Data Access while iterating through a ResultSet is conceptually the
same as JDBC, but the mechanics are not identical, since SQL/MED was
designed to be language-neutral and Java isn't one of the languages
specifically mentioned.  There will probably need to be new access code
written to retrieve rows/columns from wrappers, and I'd need to know
where to insert this.

While I intend to make the architecture extensible to the full spec, I
was thinking of limiting support in the first pass to SELECTs only.  I
think the spec allows any sort of SQL to be passed to a wrapper, but I
haven't gotten that far yet.


Re: SQL/MED

Posted by Rick Hillegas <Ri...@Sun.COM>.
Hi Frank,

I'm not aware of any discussion of this topic on derby-dev. I have used 
nabble (http://www.nabble.com/Apache-Database-f105.html) to search the 
email archive for "sql med" and nothing came up. Your proposal sounds 
like a good basis for a first implementation. Here are a couple more 
thoughts:

1) This could grow arbitrarily complicated, depending on how much heavy 
lifting you want the optimizer to do for you. With a little luck, you 
could probably trick the optimizer into treating the foreign tables like 
Table Functions. That would skirt some of the interesting optimizer issues.

2) A later implementation could get the optimizer involved and you could 
build some interesting cost model for remote tables.

3) Alternatively, you might be able to get an optimizer expert like Army 
involved early on. He might be able to help you divide and conquer this one.

You might want to start a wiki page on this topic and take it from there.

Regards,
-Rick

Frank Griffin wrote:
> To recap...
>
> A while back, I asked about accessing foreign data from Derby.  Rick
> very helpfully pointed me to Table Functions, and I've created a
> prototype using Table Functions which accesses my foreign engine.  This
> all works very nicely, within the design capabilities of Table Functions.
>
> I had pointed out that without some way to push predicates, projection
> and selection would be crippled, and lookup-type activities on large
> result sets would be pretty inefficient.  Rick responded that for that
> type of access to work well, what I probably wanted was SQL/MED.
>
> So I've gotten the SQL/MED spec and poked through it a bit.  It looks
> like exactly what I'd like.
>
> Has there been any discussion of implementing SQL/MED in Derby ?  If so,
> was there any resolution ?  From what I can see googling around, I can't
> find any indication that any SQL engine actually provides a reference
> implementation of SQL/MED, in spite of the fact that it's been in the
> spec since 1999 and was enhanced in 2003.
>
> I would be  interested in working on such an implementation in Derby,
> assuming that the idea hasn't already been discussed and discarded.
>
> From what I can tell, this would require DDL syntax pretty much like
> what Table Functions required.  CREATE FOREIGN TABLE would be very
> similar to CREATE FUNCTION (TABLE) and the others are pretty simple (as
> they don't require column definitions, and pretty much just specify
> software and connection linkages).
>
> It would also require walking the SQL parse tree and creating the Value
> Expressions which describe the SELECT elements, the WHERE clause
> components, and so forth. 
>
> Derby would be required to initialize the MED wrapper and pass a Request
> Handle related to the actual query Derby would like the foreign wrapper
> to execute.  The wrapper uses this Handle to query Derby about the
> individual components of the Request, and accepts those that it can
> handle.  Derby then makes its own arrangements to handle any screening
> that the wrapper can't handle, and generates an execution plan which
> involves calling the wrapper to do the parts of which it has declared
> itself capable.
>
> Beyond that, the support is very much like what Table Functions already
> supports.  Derby activates the individual query components handled by
> the wrapper, and navigates among the rows using a model pretty much like
> the JDBC ResultSet model used by Table Functions.
>
> Any interest in this, or am I beating a previously-deceased horse ?
>
>
>