You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by David Li <li...@apache.org> on 2022/06/30 20:35:07 UTC

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

Reviving this discussion: would people be interested in seeing a sketched-out CommandSubstraitQuery et. al.?

Additionally, while working on ADBC, I realized: does Flight SQL need explicit Commit/Rollback commands? This would presumably be necessary if we want to build ODBC/JDBC drivers on top, since those standards have explicit commands, and Flight SQL doesn't have the luxury of a driver to issue database-specific SQL to implement these.

It would also then be good to make explicit the statefulness of connections in Flight SQL. While that is sort of an obvious constraint, it is at odds with how gRPC is usually used (especially in the presence of load balancing).

On Sun, Mar 6, 2022, at 14:44, Gavin Ray wrote:
> Got it, thank you David!
> I started prototyping the implementation last night, hopefully I will make
> some good progress and have something basic functioning soon.
>
> RE: The metadata thing -- I think both Calcite and Teiid have solid
> interfaces for defining what capabilities a datasource has.
> https://github.com/teiid/teiid/blob/8e9057a46be009d68b2d67701781f1f8c175baa7/api/src/main/java/org/teiid/translator/ExecutionFactory.java#L349-L1528
>
> It's probably not possible to make something universal, but it seems like
> you could get pretty close to most common functionality/capabilities
>
>
> On Sat, Mar 5, 2022 at 11:48 PM Kyle Porter <ky...@bitquilltech.com.invalid>
> wrote:
>
>> Yes, we should, where possible, avoid any one of metadata. This is where
>> other standards fail in that applications must be custom built for each
>> data source, if we standardize the metadata then applications can at least
>> be built to adapt.
>>
>> On Sat., Mar. 5, 2022, 6:54 p.m. David Li, <li...@apache.org> wrote:
>>
>> > Yes, GetSqlInfo reserves a range of metadata IDs for Flight SQL's use, so
>> > the application can use others for its own purposes. That said if they
>> seem
>> > commonly applicable maybe we should try to standardize them.
>> >
>> > I think what you are doing should be reasonable. You may not need _all_
>> of
>> > the capabilities in Flight SQL for this (e.g. all the various metadata
>> > calls, or prepared statements, perhaps) but I don't see why it wouldn't
>> > work for you.
>> >
>> > On Fri, Mar 4, 2022, at 19:03, Gavin Ray wrote:
>> > > To touch on the question about supported features -- is it possible to
>> > > advertise arbitrary/custom "capabilites" in GetSqlInfo?
>> > > Say that you want to represent some set of behaviors that FlightSQL
>> > > services can support.
>> > >
>> > > Stuff like "Supports grouping by multiple distinct aggregates",
>> "Supports
>> > > self-joins on aliased tables" etc
>> > > This is going to be unique to each implementation, but I couldn't
>> > determine
>> > > whether there was a way to express arbitrary capabilities
>> > >
>> > > Also, in case it's helpful I put together an ASCII diagram of what I'm
>> > > trying to do with FlightSQL
>> > > If anyone has a moment, would appreciate input on whether it's
>> feasible/a
>> > > good idea
>> > >
>> > > https://pastebin.com/raw/VF2r0F3f
>> > >
>> > > Thank you =)
>> > >
>> > >
>> > > On Fri, Mar 4, 2022 at 2:37 PM David Li <li...@apache.org> wrote:
>> > >
>> > >> We could also add say CommandSubstraitQuery as a distinct message, and
>> > >> older servers would just reject it as an unknown request type.
>> > >>
>> > >> -David
>> > >>
>> > >> On Fri, Mar 4, 2022, at 17:01, Micah Kornfield wrote:
>> > >> >>
>> > >> >> 1. How does a server report that it supports each command type?
>> > Initial
>> > >> >> thought is a property in GetSqlInfo.
>> > >> >
>> > >> >
>> > >> > This sounds reasonable.
>> > >> >
>> > >> >
>> > >> >> What happens to client code written prior to changing the command
>> > type
>> > >> >> to be a oneOf field? Same for servers.
>> > >> >
>> > >> >
>> > >> > It is transparent from older clients (I'm 99% sure the wire protocol
>> > >> > doesn't change).  Servers is a little harder.  The one saving grace
>> > is I
>> > >> > don't think an empty/not-present SQL string would be something most
>> > >> servers
>> > >> > could handle, so they would probably error with something that while
>> > >> > not-obvious would give a clue to the clients (but hopefully this
>> would
>> > >> be a
>> > >> > non-issue because the capabilities would be checked for clients
>> > wishing
>> > >> to
>> > >> > to use this feature first).
>> > >> >
>> > >> > -Micah
>> > >> >
>> > >> > On Fri, Mar 4, 2022 at 1:50 PM James Duong <jamesd@bitquilltech.com
>> > >> .invalid>
>> > >> > wrote:
>> > >> >
>> > >> >> It sounds like an interesting and useful project to use Subtstrait
>> > as an
>> > >> >> alternative to SQL strings.
>> > >> >>
>> > >> >> Important aspects to spec out are:
>> > >> >> 1. How does a server report that it supports each command type?
>> > Initial
>> > >> >> thought is a property in GetSqlInfo.
>> > >> >> 2. What happens to client code written prior to changing the
>> command
>> > >> type
>> > >> >> to be a oneOf field? Same for servers.
>> > >> >> More generally, how should backward compatibility work, and what
>> > should
>> > >> >> happen if a client sends an unsupported
>> > >> >> command type to a server.
>> > >> >> 3. Should inputs to catalog RPC calls also accept Substrait
>> > structures?
>> > >> >>
>> > >> >> On Thu, Mar 3, 2022 at 11:00 PM Gavin Ray <ra...@gmail.com>
>> > >> wrote:
>> > >> >>
>> > >> >> > @James Duong <ja...@bitquilltech.com>
>> > >> >> >
>> > >> >> > You are absolutely right, I realized this and confirmed whether
>> > this
>> > >> >> > would be possible with Jacques to double-check.
>> > >> >> > It would amount to what I might call "dollar-store Substrait."
>> It's
>> > >> not
>> > >> >> > elegant or a good solution, but definitely presents a good
>> > duct-tape
>> > >> hack
>> > >> >> > and is a crafty idea.
>> > >> >> >
>> > >> >> > I agree with Jacques -- when you think about FlightSQL, what you
>> > are
>> > >> >> > attempting with a query isn't necessarily SQL, but a general
>> > >> data-compute
>> > >> >> > operation.
>> > >> >> > SQL just so happens to be a fairly universal way to express them,
>> > >> with an
>> > >> >> > ANSI standard, but FlightSQL doesn't recognize any particular
>> > subset
>> > >> of
>> > >> >> it
>> > >> >> > and for all intents and purposes it doesn't matter what the
>> > operation
>> > >> >> > string contains.
>> > >> >> >
>> > >> >> > Substrait would make a fantastic logical next-feature because
>> it's
>> > >> >> > targeted as a specification for expressing relational algebra and
>> > >> >> > data-compute operations
>> > >> >> > This more-or-less equates to SQL strings (in my mind at least)
>> > with a
>> > >> >> much
>> > >> >> > better toolkit and Dev UX. If there is anything I can do to help
>> > move
>> > >> >> this
>> > >> >> > forward, please let me know because I am extremely motivated to
>> do
>> > so.
>> > >> >> >
>> > >> >> > @David Li <gi...@lidavidm.me>
>> > >> >> >
>> > >> >> > Also agreed. Substrait is put together by folks much smarter than
>> > >> myself,
>> > >> >> > and if I had to hedge my bets, I'd put money on it being the
>> > future of
>> > >> >> > data-compute interop.
>> > >> >> > I would love nothing more than to adopt this technology and push
>> it
>> > >> >> along.
>> > >> >> >
>> > >> >> > Your project does sound interesting - basically, it sounds like a
>> > >> tabular
>> > >> >> >> data storage service with query pushdown?
>> > >> >> >>
>> > >> >> >
>> > >> >> > Yeah this is more or less the details of it (my personal email,
>> > with
>> > >> >> > discretion assumed, is always open)
>> > >> >> >
>> > >> >> > Imagine an environment where a backend wants to advertise some
>> > kind of
>> > >> >> > schema/data catalog
>> > >> >> >
>> > >> >> > And then a central service introspects these backends, and
>> > dynamically
>> > >> >> > generates an API from the data catalogues/schemas, where requests
>> > get
>> > >> >> > proxied to the underlying backend service for each schema to
>> > actually
>> > >> be
>> > >> >> > executed
>> > >> >> >
>> > >> >> > In text, the flow would look something like:
>> > >> >> >
>> > >> >> >
>> > >> >> >        <----> Data Provider Backend 0
>> > >> >> > Client <-----> Central Service <---> Generated API <---->
>> > >> Data-Provider
>> > >> >> > Backend 1
>> > >> >> >
>> > >> >> >        <----> Data Provider Backend 2
>> > >> >> >
>> > >> >> >
>> > >> >> >
>> > >> >> > On Thu, Mar 3, 2022 at 5:52 PM David Li <li...@apache.org>
>> > wrote:
>> > >> >> >
>> > >> >> >> Gavin, thanks for sharing. I'm not so sure you'll find an
>> > >> alternative to
>> > >> >> >> Substrait, at least one that isn't even more nascent or one
>> that's
>> > >> very
>> > >> >> >> tied to a particular language, so perhaps it might be better to
>> > get
>> > >> >> >> involved in Substrait and see if it suits your needs?
>> Convincing a
>> > >> team
>> > >> >> to
>> > >> >> >> try something new can be hard, though, and it is somewhat of a
>> > moving
>> > >> >> >> target - but Flight SQL is in a similar spot, I think, as it's
>> > still
>> > >> >> >> getting enhancements.
>> > >> >> >>
>> > >> >> >> Your project does sound interesting - basically, it sounds like
>> a
>> > >> >> tabular
>> > >> >> >> data storage service with query pushdown?
>> > >> >> >>
>> > >> >> >> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote:
>> > >> >> >> > James, I agree that you could use JSON but that feels a bit
>> > hacky
>> > >> >> >> > (mis-use
>> > >> >> >> > of the paradigm). Instead, I'd really like to do something
>> like
>> > >> David
>> > >> >> is
>> > >> >> >> > suggesting: support Substrait as an alternative to a SQL
>> string.
>> > >> >> >> > Something like this:
>> > >> >> >> >
>> > >> >> >>
>> > >> >>
>> > >>
>> >
>> https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc
>> > >> >> >> >
>> > >> >> >> > It would be great if someone wanted to pick this up. It would
>> > be a
>> > >> >> nice
>> > >> >> >> > enhancement to FlightSQL (and provide a structured way to
>> > express
>> > >> >> >> > operations).
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> >
>> > >> >> >> > On Thu, Mar 3, 2022 at 4:56 PM James Duong <
>> > >> jamesd@bitquilltech.com
>> > >> >> >> .invalid>
>> > >> >> >> > wrote:
>> > >> >> >> >
>> > >> >> >> >> In the same way that you could write an ODBC driver that
>> takes
>> > in
>> > >> >> text
>> > >> >> >> >> that's not SQL, you could write a Flight SQL server that
>> takes
>> > in
>> > >> >> text
>> > >> >> >> >> that's JSON.
>> > >> >> >> >> Flight SQL doesn't parse the query, so you could create
>> > commands
>> > >> that
>> > >> >> >> are
>> > >> >> >> >> just JSON text.
>> > >> >> >> >>
>> > >> >> >> >> Is that the only bit you need, Gavin?
>> > >> >> >> >>
>> > >> >> >> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray <
>> > ray.gavin97@gmail.com>
>> > >> >> >> wrote:
>> > >> >> >> >>
>> > >> >> >> >> > I am enthusiastic about Substrait and have followed it's
>> > >> progress
>> > >> >> >> eagerly
>> > >> >> >> >> > =D
>> > >> >> >> >> >
>> > >> >> >> >> > When I presented it as a tentative option, there were
>> > >> reservations
>> > >> >> >> >> because
>> > >> >> >> >> > of the project/spec being young and the functionality still
>> > >> being
>> > >> >> >> >> > fleshed out.
>> > >> >> >> >> > I think if I were having this conversation in say, 8-16
>> > months,
>> > >> it
>> > >> >> >> would
>> > >> >> >> >> > have been an easy choice, no doubt.
>> > >> >> >> >> >
>> > >> >> >> >> > On a public mailing list (and I can share more details in
>> > >> private
>> > >> >> if
>> > >> >> >> >> you're
>> > >> >> >> >> > curious), the gist of it is this:
>> > >> >> >> >> >
>> > >> >> >> >> > Some well-defined/backed-by-mature tech solution for
>> > expressing
>> > >> >> data
>> > >> >> >> >> > compute operations between services would be a useful thing
>> > to
>> > >> have
>> > >> >> >> >> > (Especially if it's language-agnostic)
>> > >> >> >> >> >
>> > >> >> >> >> > The goal is for an "implementing service" to have:
>> > >> >> >> >> > - An introspectable schema (IE, "describe yourself to me")
>> > >> >> >> >> > - A query/operation execution endpoint (IE: "perform this
>> > >> operation
>> > >> >> >> on
>> > >> >> >> >> your
>> > >> >> >> >> > data")
>> > >> >> >> >> >
>> > >> >> >> >> > With FlightSQL this is possible I believe, but it requires
>> > the
>> > >> >> >> operation
>> > >> >> >> >> to
>> > >> >> >> >> > be expressed as a SQL string which isn't ideal.
>> > >> >> >> >> >
>> > >> >> >> >> > Working with some programmatic, structured object that has
>> > the
>> > >> same
>> > >> >> >> >> > semantics ("Logical Plan", or whatnot) as a SQL query would
>> > >> have,
>> > >> >> >> would
>> > >> >> >> >> be
>> > >> >> >> >> > a better experience
>> > >> >> >> >> > (Jacques is on to something here!)
>> > >> >> >> >> >
>> > >> >> >> >> > This interface between services would be somewhat the
>> > >> equivalent of
>> > >> >> >> an
>> > >> >> >> >> > "SDK", so it would be nice to have a strongly-typed library
>> > for
>> > >> >> >> >> expressing
>> > >> >> >> >> > and building-up query/data-compute ops.
>> > >> >> >> >> >
>> > >> >> >> >> >
>> > >> >> >> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li <
>> lidavidm@apache.org
>> > >
>> > >> >> wrote:
>> > >> >> >> >> >
>> > >> >> >> >> > > You probably want Substrait: https://substrait.io/
>> > >> >> >> >> > >
>> > >> >> >> >> > > Which is being worked on by several people, including
>> Arrow
>> > >> >> >> community
>> > >> >> >> >> > > members.
>> > >> >> >> >> > >
>> > >> >> >> >> > > It might be interesting to generalize Flight SQL to
>> include
>> > >> >> >> support for
>> > >> >> >> >> > > Substrait. I'm curious what your application, if you're
>> > able
>> > >> to
>> > >> >> >> share
>> > >> >> >> >> > more.
>> > >> >> >> >> > >
>> > >> >> >> >> > > -David
>> > >> >> >> >> > >
>> > >> >> >> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote:
>> > >> >> >> >> > > > Hiya,
>> > >> >> >> >> > > >
>> > >> >> >> >> > > > I am drafting a proposal for a way to enable services
>> to
>> > >> >> express
>> > >> >> >> data
>> > >> >> >> >> > > > compute operations to each other.
>> > >> >> >> >> > > >
>> > >> >> >> >> > > > However I think it'll be difficult to get buy-in if the
>> > only
>> > >> >> >> >> > > representation
>> > >> >> >> >> > > > for queries is as SQL strings.
>> > >> >> >> >> > > >
>> > >> >> >> >> > > > Is there any kind of lower-level API that can be used
>> to
>> > >> >> express
>> > >> >> >> >> > > operations?
>> > >> >> >> >> > > >
>> > >> >> >> >> > > > IE instead of "SELECT name FROM user"
>> > >> >> >> >> > > >
>> > >> >> >> >> > > > A structured representation like:
>> > >> >> >> >> > > > {
>> > >> >> >> >> > > >   "op": "query",
>> > >> >> >> >> > > >   "schema": "user",
>> > >> >> >> >> > > >   "project": ["name"]
>> > >> >> >> >> > > > }
>> > >> >> >> >> > > >
>> > >> >> >> >> > > > Or maybe this is a bad idea/doesn't make sense?
>> > >> >> >> >> > > >
>> > >> >> >> >> > > > Thank you =)
>> > >> >> >> >> > >
>> > >> >> >> >> >
>> > >> >> >> >>
>> > >> >> >> >>
>> > >> >> >> >> --
>> > >> >> >> >>
>> > >> >> >> >> *James Duong*
>> > >> >> >> >> Lead Software Developer
>> > >> >> >> >> Bit Quill Technologies Inc.
>> > >> >> >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>> > >> >> >> >> https://www.bitquilltech.com
>> > >> >> >> >>
>> > >> >> >> >> This email message is for the sole use of the intended
>> > >> recipient(s)
>> > >> >> >> and may
>> > >> >> >> >> contain confidential and privileged information.  Any
>> > unauthorized
>> > >> >> >> review,
>> > >> >> >> >> use, disclosure, or distribution is prohibited.  If you are
>> not
>> > >> the
>> > >> >> >> >> intended recipient, please contact the sender by reply email
>> > and
>> > >> >> >> destroy
>> > >> >> >> >> all copies of the original message.  Thank you.
>> > >> >> >> >>
>> > >> >> >>
>> > >> >> >
>> > >> >>
>> > >> >> --
>> > >> >>
>> > >> >> *James Duong*
>> > >> >> Lead Software Developer
>> > >> >> Bit Quill Technologies Inc.
>> > >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>> > >> >> https://www.bitquilltech.com
>> > >> >>
>> > >> >> This email message is for the sole use of the intended recipient(s)
>> > and
>> > >> may
>> > >> >> contain confidential and privileged information.  Any unauthorized
>> > >> review,
>> > >> >> use, disclosure, or distribution is prohibited.  If you are not the
>> > >> >> intended recipient, please contact the sender by reply email and
>> > destroy
>> > >> >> all copies of the original message.  Thank you.
>> > >> >>
>> > >>
>> >
>>

[RFC] Substrait for Flight SQL

Posted by David Li <li...@apache.org>.
It was pointed out to me that by using an old thread, people may not have realized there's actually a discussion here. So this is just one final call for comments on a proposal to add support for Substrait [1] to Flight SQL: https://github.com/apache/arrow/pull/13492

The proposal also adds support for explicit transactions with a view towards making it easier to support interoperability with standards like JDBC and ODBC. 

There's implementations for C++ and Java, as well as integration tests. So assuming no further comments, I plan to start a vote on Monday. 

[1]: https://substrait.io/

On Thu, Aug 18, 2022, at 14:01, David Li wrote:
> I've updated the PR [1] and I believe everything is resolved. (I've 
> fixed ARROW-17254, and changed the Protobuf definition to work around 
> Protobuf's issues.) If there's no further comments, I'll start a vote 
> in the coming days.
>
> [1]: https://github.com/apache/arrow/pull/13492
>
> Thanks,
> David
>
> On Fri, Aug 5, 2022, at 14:54, David Li wrote:
>> I've added implementations for Java and C++ to the draft [1], including 
>> integration tests, after addressing comments on the proposal itself 
>> (thanks all for the comments). 
>>
>> One thing is, I might suggest punting on CancelQuery for now, or 
>> changing how it's implemented, since embedding a message from 
>> Flight.proto into FlightSql.proto interacts badly with Windows/DLLs 
>> (protoc has poor support for embedding dllimport/dllexport macros).
>>
>> Otherwise I think things are ready, though we'll want to fix 
>> ARROW-17254 [2] alongside it.
>>
>> [1]: https://github.com/apache/arrow/pull/13492
>> [2]: https://issues.apache.org/jira/browse/ARROW-17254
>>
>> On Fri, Jul 1, 2022, at 14:34, David Li wrote:
>>> I quickly drafted these out (sans implementation so far): 
>>> https://github.com/apache/arrow/pull/13492
>>>
>>> On Thu, Jun 30, 2022, at 21:20, David Li wrote:
>>>> Ah - somehow I didn't think of that. Yes, we should just implement it 
>>>> in the same way prepared statements are already implemented.
>>>>
>>>> On Thu, Jun 30, 2022, at 19:42, Micah Kornfield wrote:
>>>>>>
>>>>>> It would also then be good to make explicit the statefulness of
>>>>>> connections in Flight SQL. While that is sort of an obvious constraint, it
>>>>>> is at odds with how gRPC is usually used (especially in the presence of
>>>>>> load balancing).
>>>>>
>>>>>
>>>>> I'm not sure I understand where the statefulness requirements come in?
>>>>> Could you elaborate?  It seems that a transaction could be an opaque ID on
>>>>> operations?
>>>>>
>>>>> On Thu, Jun 30, 2022 at 2:47 PM James Duong <ja...@bitquilltech.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> This is a bit of a tangent from the original discussion about
>>>>>> Substrait integration.
>>>>>>
>>>>>> Flight SQL would definitely benefit from transaction RPC commands for
>>>>>> building bridge drivers. I'm also wondering if there should be an RPC call
>>>>>> to cancel a running query, as opposed to just having the client terminate
>>>>>> streams. This would allow a multi-process application to cancel work across
>>>>>> processes.
>>>>>>
>>>>>> On Thu, Jun 30, 2022 at 1:35 PM David Li <li...@apache.org> wrote:
>>>>>>
>>>>>> > Reviving this discussion: would people be interested in seeing a
>>>>>> > sketched-out CommandSubstraitQuery et. al.?
>>>>>> >
>>>>>> > Additionally, while working on ADBC, I realized: does Flight SQL need
>>>>>> > explicit Commit/Rollback commands? This would presumably be necessary if
>>>>>> we
>>>>>> > want to build ODBC/JDBC drivers on top, since those standards have
>>>>>> explicit
>>>>>> > commands, and Flight SQL doesn't have the luxury of a driver to issue
>>>>>> > database-specific SQL to implement these.
>>>>>> >
>>>>>> > It would also then be good to make explicit the statefulness of
>>>>>> > connections in Flight SQL. While that is sort of an obvious constraint,
>>>>>> it
>>>>>> > is at odds with how gRPC is usually used (especially in the presence of
>>>>>> > load balancing).
>>>>>> >
>>>>>> > On Sun, Mar 6, 2022, at 14:44, Gavin Ray wrote:
>>>>>> > > Got it, thank you David!
>>>>>> > > I started prototyping the implementation last night, hopefully I will
>>>>>> > make
>>>>>> > > some good progress and have something basic functioning soon.
>>>>>> > >
>>>>>> > > RE: The metadata thing -- I think both Calcite and Teiid have solid
>>>>>> > > interfaces for defining what capabilities a datasource has.
>>>>>> > >
>>>>>> >
>>>>>> https://github.com/teiid/teiid/blob/8e9057a46be009d68b2d67701781f1f8c175baa7/api/src/main/java/org/teiid/translator/ExecutionFactory.java#L349-L1528
>>>>>> > >
>>>>>> > > It's probably not possible to make something universal, but it seems
>>>>>> like
>>>>>> > > you could get pretty close to most common functionality/capabilities
>>>>>> > >
>>>>>> > >
>>>>>> > > On Sat, Mar 5, 2022 at 11:48 PM Kyle Porter <kylep@bitquilltech.com
>>>>>> > .invalid>
>>>>>> > > wrote:
>>>>>> > >
>>>>>> > >> Yes, we should, where possible, avoid any one of metadata. This is
>>>>>> where
>>>>>> > >> other standards fail in that applications must be custom built for
>>>>>> each
>>>>>> > >> data source, if we standardize the metadata then applications can at
>>>>>> > least
>>>>>> > >> be built to adapt.
>>>>>> > >>
>>>>>> > >> On Sat., Mar. 5, 2022, 6:54 p.m. David Li, <li...@apache.org>
>>>>>> wrote:
>>>>>> > >>
>>>>>> > >> > Yes, GetSqlInfo reserves a range of metadata IDs for Flight SQL's
>>>>>> > use, so
>>>>>> > >> > the application can use others for its own purposes. That said if
>>>>>> they
>>>>>> > >> seem
>>>>>> > >> > commonly applicable maybe we should try to standardize them.
>>>>>> > >> >
>>>>>> > >> > I think what you are doing should be reasonable. You may not need
>>>>>> > _all_
>>>>>> > >> of
>>>>>> > >> > the capabilities in Flight SQL for this (e.g. all the various
>>>>>> metadata
>>>>>> > >> > calls, or prepared statements, perhaps) but I don't see why it
>>>>>> > wouldn't
>>>>>> > >> > work for you.
>>>>>> > >> >
>>>>>> > >> > On Fri, Mar 4, 2022, at 19:03, Gavin Ray wrote:
>>>>>> > >> > > To touch on the question about supported features -- is it
>>>>>> possible
>>>>>> > to
>>>>>> > >> > > advertise arbitrary/custom "capabilites" in GetSqlInfo?
>>>>>> > >> > > Say that you want to represent some set of behaviors that
>>>>>> FlightSQL
>>>>>> > >> > > services can support.
>>>>>> > >> > >
>>>>>> > >> > > Stuff like "Supports grouping by multiple distinct aggregates",
>>>>>> > >> "Supports
>>>>>> > >> > > self-joins on aliased tables" etc
>>>>>> > >> > > This is going to be unique to each implementation, but I couldn't
>>>>>> > >> > determine
>>>>>> > >> > > whether there was a way to express arbitrary capabilities
>>>>>> > >> > >
>>>>>> > >> > > Also, in case it's helpful I put together an ASCII diagram of what
>>>>>> > I'm
>>>>>> > >> > > trying to do with FlightSQL
>>>>>> > >> > > If anyone has a moment, would appreciate input on whether it's
>>>>>> > >> feasible/a
>>>>>> > >> > > good idea
>>>>>> > >> > >
>>>>>> > >> > > https://pastebin.com/raw/VF2r0F3f
>>>>>> > >> > >
>>>>>> > >> > > Thank you =)
>>>>>> > >> > >
>>>>>> > >> > >
>>>>>> > >> > > On Fri, Mar 4, 2022 at 2:37 PM David Li <li...@apache.org>
>>>>>> > wrote:
>>>>>> > >> > >
>>>>>> > >> > >> We could also add say CommandSubstraitQuery as a distinct
>>>>>> message,
>>>>>> > and
>>>>>> > >> > >> older servers would just reject it as an unknown request type.
>>>>>> > >> > >>
>>>>>> > >> > >> -David
>>>>>> > >> > >>
>>>>>> > >> > >> On Fri, Mar 4, 2022, at 17:01, Micah Kornfield wrote:
>>>>>> > >> > >> >>
>>>>>> > >> > >> >> 1. How does a server report that it supports each command
>>>>>> type?
>>>>>> > >> > Initial
>>>>>> > >> > >> >> thought is a property in GetSqlInfo.
>>>>>> > >> > >> >
>>>>>> > >> > >> >
>>>>>> > >> > >> > This sounds reasonable.
>>>>>> > >> > >> >
>>>>>> > >> > >> >
>>>>>> > >> > >> >> What happens to client code written prior to changing the
>>>>>> > command
>>>>>> > >> > type
>>>>>> > >> > >> >> to be a oneOf field? Same for servers.
>>>>>> > >> > >> >
>>>>>> > >> > >> >
>>>>>> > >> > >> > It is transparent from older clients (I'm 99% sure the wire
>>>>>> > protocol
>>>>>> > >> > >> > doesn't change).  Servers is a little harder.  The one saving
>>>>>> > grace
>>>>>> > >> > is I
>>>>>> > >> > >> > don't think an empty/not-present SQL string would be something
>>>>>> > most
>>>>>> > >> > >> servers
>>>>>> > >> > >> > could handle, so they would probably error with something that
>>>>>> > while
>>>>>> > >> > >> > not-obvious would give a clue to the clients (but hopefully
>>>>>> this
>>>>>> > >> would
>>>>>> > >> > >> be a
>>>>>> > >> > >> > non-issue because the capabilities would be checked for clients
>>>>>> > >> > wishing
>>>>>> > >> > >> to
>>>>>> > >> > >> > to use this feature first).
>>>>>> > >> > >> >
>>>>>> > >> > >> > -Micah
>>>>>> > >> > >> >
>>>>>> > >> > >> > On Fri, Mar 4, 2022 at 1:50 PM James Duong <
>>>>>> > jamesd@bitquilltech.com
>>>>>> > >> > >> .invalid>
>>>>>> > >> > >> > wrote:
>>>>>> > >> > >> >
>>>>>> > >> > >> >> It sounds like an interesting and useful project to use
>>>>>> > Subtstrait
>>>>>> > >> > as an
>>>>>> > >> > >> >> alternative to SQL strings.
>>>>>> > >> > >> >>
>>>>>> > >> > >> >> Important aspects to spec out are:
>>>>>> > >> > >> >> 1. How does a server report that it supports each command
>>>>>> type?
>>>>>> > >> > Initial
>>>>>> > >> > >> >> thought is a property in GetSqlInfo.
>>>>>> > >> > >> >> 2. What happens to client code written prior to changing the
>>>>>> > >> command
>>>>>> > >> > >> type
>>>>>> > >> > >> >> to be a oneOf field? Same for servers.
>>>>>> > >> > >> >> More generally, how should backward compatibility work, and
>>>>>> what
>>>>>> > >> > should
>>>>>> > >> > >> >> happen if a client sends an unsupported
>>>>>> > >> > >> >> command type to a server.
>>>>>> > >> > >> >> 3. Should inputs to catalog RPC calls also accept Substrait
>>>>>> > >> > structures?
>>>>>> > >> > >> >>
>>>>>> > >> > >> >> On Thu, Mar 3, 2022 at 11:00 PM Gavin Ray <
>>>>>> > ray.gavin97@gmail.com>
>>>>>> > >> > >> wrote:
>>>>>> > >> > >> >>
>>>>>> > >> > >> >> > @James Duong <ja...@bitquilltech.com>
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> > You are absolutely right, I realized this and confirmed
>>>>>> > whether
>>>>>> > >> > this
>>>>>> > >> > >> >> > would be possible with Jacques to double-check.
>>>>>> > >> > >> >> > It would amount to what I might call "dollar-store
>>>>>> Substrait."
>>>>>> > >> It's
>>>>>> > >> > >> not
>>>>>> > >> > >> >> > elegant or a good solution, but definitely presents a good
>>>>>> > >> > duct-tape
>>>>>> > >> > >> hack
>>>>>> > >> > >> >> > and is a crafty idea.
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> > I agree with Jacques -- when you think about FlightSQL, what
>>>>>> > you
>>>>>> > >> > are
>>>>>> > >> > >> >> > attempting with a query isn't necessarily SQL, but a general
>>>>>> > >> > >> data-compute
>>>>>> > >> > >> >> > operation.
>>>>>> > >> > >> >> > SQL just so happens to be a fairly universal way to express
>>>>>> > them,
>>>>>> > >> > >> with an
>>>>>> > >> > >> >> > ANSI standard, but FlightSQL doesn't recognize any
>>>>>> particular
>>>>>> > >> > subset
>>>>>> > >> > >> of
>>>>>> > >> > >> >> it
>>>>>> > >> > >> >> > and for all intents and purposes it doesn't matter what the
>>>>>> > >> > operation
>>>>>> > >> > >> >> > string contains.
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> > Substrait would make a fantastic logical next-feature
>>>>>> because
>>>>>> > >> it's
>>>>>> > >> > >> >> > targeted as a specification for expressing relational
>>>>>> algebra
>>>>>> > and
>>>>>> > >> > >> >> > data-compute operations
>>>>>> > >> > >> >> > This more-or-less equates to SQL strings (in my mind at
>>>>>> least)
>>>>>> > >> > with a
>>>>>> > >> > >> >> much
>>>>>> > >> > >> >> > better toolkit and Dev UX. If there is anything I can do to
>>>>>> > help
>>>>>> > >> > move
>>>>>> > >> > >> >> this
>>>>>> > >> > >> >> > forward, please let me know because I am extremely motivated
>>>>>> > to
>>>>>> > >> do
>>>>>> > >> > so.
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> > @David Li <gi...@lidavidm.me>
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> > Also agreed. Substrait is put together by folks much smarter
>>>>>> > than
>>>>>> > >> > >> myself,
>>>>>> > >> > >> >> > and if I had to hedge my bets, I'd put money on it being the
>>>>>> > >> > future of
>>>>>> > >> > >> >> > data-compute interop.
>>>>>> > >> > >> >> > I would love nothing more than to adopt this technology and
>>>>>> > push
>>>>>> > >> it
>>>>>> > >> > >> >> along.
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> > Your project does sound interesting - basically, it sounds
>>>>>> > like a
>>>>>> > >> > >> tabular
>>>>>> > >> > >> >> >> data storage service with query pushdown?
>>>>>> > >> > >> >> >>
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> > Yeah this is more or less the details of it (my personal
>>>>>> > email,
>>>>>> > >> > with
>>>>>> > >> > >> >> > discretion assumed, is always open)
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> > Imagine an environment where a backend wants to advertise
>>>>>> some
>>>>>> > >> > kind of
>>>>>> > >> > >> >> > schema/data catalog
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> > And then a central service introspects these backends, and
>>>>>> > >> > dynamically
>>>>>> > >> > >> >> > generates an API from the data catalogues/schemas, where
>>>>>> > requests
>>>>>> > >> > get
>>>>>> > >> > >> >> > proxied to the underlying backend service for each schema to
>>>>>> > >> > actually
>>>>>> > >> > >> be
>>>>>> > >> > >> >> > executed
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> > In text, the flow would look something like:
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> >        <----> Data Provider Backend 0
>>>>>> > >> > >> >> > Client <-----> Central Service <---> Generated API <---->
>>>>>> > >> > >> Data-Provider
>>>>>> > >> > >> >> > Backend 1
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> >        <----> Data Provider Backend 2
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> > On Thu, Mar 3, 2022 at 5:52 PM David Li <
>>>>>> lidavidm@apache.org>
>>>>>> > >> > wrote:
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >> >> Gavin, thanks for sharing. I'm not so sure you'll find an
>>>>>> > >> > >> alternative to
>>>>>> > >> > >> >> >> Substrait, at least one that isn't even more nascent or one
>>>>>> > >> that's
>>>>>> > >> > >> very
>>>>>> > >> > >> >> >> tied to a particular language, so perhaps it might be
>>>>>> better
>>>>>> > to
>>>>>> > >> > get
>>>>>> > >> > >> >> >> involved in Substrait and see if it suits your needs?
>>>>>> > >> Convincing a
>>>>>> > >> > >> team
>>>>>> > >> > >> >> to
>>>>>> > >> > >> >> >> try something new can be hard, though, and it is somewhat
>>>>>> of
>>>>>> > a
>>>>>> > >> > moving
>>>>>> > >> > >> >> >> target - but Flight SQL is in a similar spot, I think, as
>>>>>> > it's
>>>>>> > >> > still
>>>>>> > >> > >> >> >> getting enhancements.
>>>>>> > >> > >> >> >>
>>>>>> > >> > >> >> >> Your project does sound interesting - basically, it sounds
>>>>>> > like
>>>>>> > >> a
>>>>>> > >> > >> >> tabular
>>>>>> > >> > >> >> >> data storage service with query pushdown?
>>>>>> > >> > >> >> >>
>>>>>> > >> > >> >> >> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote:
>>>>>> > >> > >> >> >> > James, I agree that you could use JSON but that feels a
>>>>>> bit
>>>>>> > >> > hacky
>>>>>> > >> > >> >> >> > (mis-use
>>>>>> > >> > >> >> >> > of the paradigm). Instead, I'd really like to do
>>>>>> something
>>>>>> > >> like
>>>>>> > >> > >> David
>>>>>> > >> > >> >> is
>>>>>> > >> > >> >> >> > suggesting: support Substrait as an alternative to a SQL
>>>>>> > >> string.
>>>>>> > >> > >> >> >> > Something like this:
>>>>>> > >> > >> >> >> >
>>>>>> > >> > >> >> >>
>>>>>> > >> > >> >>
>>>>>> > >> > >>
>>>>>> > >> >
>>>>>> > >>
>>>>>> >
>>>>>> https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc
>>>>>> > >> > >> >> >> >
>>>>>> > >> > >> >> >> > It would be great if someone wanted to pick this up. It
>>>>>> > would
>>>>>> > >> > be a
>>>>>> > >> > >> >> nice
>>>>>> > >> > >> >> >> > enhancement to FlightSQL (and provide a structured way to
>>>>>> > >> > express
>>>>>> > >> > >> >> >> > operations).
>>>>>> > >> > >> >> >> >
>>>>>> > >> > >> >> >> >
>>>>>> > >> > >> >> >> >
>>>>>> > >> > >> >> >> > On Thu, Mar 3, 2022 at 4:56 PM James Duong <
>>>>>> > >> > >> jamesd@bitquilltech.com
>>>>>> > >> > >> >> >> .invalid>
>>>>>> > >> > >> >> >> > wrote:
>>>>>> > >> > >> >> >> >
>>>>>> > >> > >> >> >> >> In the same way that you could write an ODBC driver that
>>>>>> > >> takes
>>>>>> > >> > in
>>>>>> > >> > >> >> text
>>>>>> > >> > >> >> >> >> that's not SQL, you could write a Flight SQL server that
>>>>>> > >> takes
>>>>>> > >> > in
>>>>>> > >> > >> >> text
>>>>>> > >> > >> >> >> >> that's JSON.
>>>>>> > >> > >> >> >> >> Flight SQL doesn't parse the query, so you could create
>>>>>> > >> > commands
>>>>>> > >> > >> that
>>>>>> > >> > >> >> >> are
>>>>>> > >> > >> >> >> >> just JSON text.
>>>>>> > >> > >> >> >> >>
>>>>>> > >> > >> >> >> >> Is that the only bit you need, Gavin?
>>>>>> > >> > >> >> >> >>
>>>>>> > >> > >> >> >> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray <
>>>>>> > >> > ray.gavin97@gmail.com>
>>>>>> > >> > >> >> >> wrote:
>>>>>> > >> > >> >> >> >>
>>>>>> > >> > >> >> >> >> > I am enthusiastic about Substrait and have followed
>>>>>> it's
>>>>>> > >> > >> progress
>>>>>> > >> > >> >> >> eagerly
>>>>>> > >> > >> >> >> >> > =D
>>>>>> > >> > >> >> >> >> >
>>>>>> > >> > >> >> >> >> > When I presented it as a tentative option, there were
>>>>>> > >> > >> reservations
>>>>>> > >> > >> >> >> >> because
>>>>>> > >> > >> >> >> >> > of the project/spec being young and the functionality
>>>>>> > still
>>>>>> > >> > >> being
>>>>>> > >> > >> >> >> >> > fleshed out.
>>>>>> > >> > >> >> >> >> > I think if I were having this conversation in say,
>>>>>> 8-16
>>>>>> > >> > months,
>>>>>> > >> > >> it
>>>>>> > >> > >> >> >> would
>>>>>> > >> > >> >> >> >> > have been an easy choice, no doubt.
>>>>>> > >> > >> >> >> >> >
>>>>>> > >> > >> >> >> >> > On a public mailing list (and I can share more details
>>>>>> > in
>>>>>> > >> > >> private
>>>>>> > >> > >> >> if
>>>>>> > >> > >> >> >> >> you're
>>>>>> > >> > >> >> >> >> > curious), the gist of it is this:
>>>>>> > >> > >> >> >> >> >
>>>>>> > >> > >> >> >> >> > Some well-defined/backed-by-mature tech solution for
>>>>>> > >> > expressing
>>>>>> > >> > >> >> data
>>>>>> > >> > >> >> >> >> > compute operations between services would be a useful
>>>>>> > thing
>>>>>> > >> > to
>>>>>> > >> > >> have
>>>>>> > >> > >> >> >> >> > (Especially if it's language-agnostic)
>>>>>> > >> > >> >> >> >> >
>>>>>> > >> > >> >> >> >> > The goal is for an "implementing service" to have:
>>>>>> > >> > >> >> >> >> > - An introspectable schema (IE, "describe yourself to
>>>>>> > me")
>>>>>> > >> > >> >> >> >> > - A query/operation execution endpoint (IE: "perform
>>>>>> > this
>>>>>> > >> > >> operation
>>>>>> > >> > >> >> >> on
>>>>>> > >> > >> >> >> >> your
>>>>>> > >> > >> >> >> >> > data")
>>>>>> > >> > >> >> >> >> >
>>>>>> > >> > >> >> >> >> > With FlightSQL this is possible I believe, but it
>>>>>> > requires
>>>>>> > >> > the
>>>>>> > >> > >> >> >> operation
>>>>>> > >> > >> >> >> >> to
>>>>>> > >> > >> >> >> >> > be expressed as a SQL string which isn't ideal.
>>>>>> > >> > >> >> >> >> >
>>>>>> > >> > >> >> >> >> > Working with some programmatic, structured object that
>>>>>> > has
>>>>>> > >> > the
>>>>>> > >> > >> same
>>>>>> > >> > >> >> >> >> > semantics ("Logical Plan", or whatnot) as a SQL query
>>>>>> > would
>>>>>> > >> > >> have,
>>>>>> > >> > >> >> >> would
>>>>>> > >> > >> >> >> >> be
>>>>>> > >> > >> >> >> >> > a better experience
>>>>>> > >> > >> >> >> >> > (Jacques is on to something here!)
>>>>>> > >> > >> >> >> >> >
>>>>>> > >> > >> >> >> >> > This interface between services would be somewhat the
>>>>>> > >> > >> equivalent of
>>>>>> > >> > >> >> >> an
>>>>>> > >> > >> >> >> >> > "SDK", so it would be nice to have a strongly-typed
>>>>>> > library
>>>>>> > >> > for
>>>>>> > >> > >> >> >> >> expressing
>>>>>> > >> > >> >> >> >> > and building-up query/data-compute ops.
>>>>>> > >> > >> >> >> >> >
>>>>>> > >> > >> >> >> >> >
>>>>>> > >> > >> >> >> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li <
>>>>>> > >> lidavidm@apache.org
>>>>>> > >> > >
>>>>>> > >> > >> >> wrote:
>>>>>> > >> > >> >> >> >> >
>>>>>> > >> > >> >> >> >> > > You probably want Substrait: https://substrait.io/
>>>>>> > >> > >> >> >> >> > >
>>>>>> > >> > >> >> >> >> > > Which is being worked on by several people,
>>>>>> including
>>>>>> > >> Arrow
>>>>>> > >> > >> >> >> community
>>>>>> > >> > >> >> >> >> > > members.
>>>>>> > >> > >> >> >> >> > >
>>>>>> > >> > >> >> >> >> > > It might be interesting to generalize Flight SQL to
>>>>>> > >> include
>>>>>> > >> > >> >> >> support for
>>>>>> > >> > >> >> >> >> > > Substrait. I'm curious what your application, if
>>>>>> > you're
>>>>>> > >> > able
>>>>>> > >> > >> to
>>>>>> > >> > >> >> >> share
>>>>>> > >> > >> >> >> >> > more.
>>>>>> > >> > >> >> >> >> > >
>>>>>> > >> > >> >> >> >> > > -David
>>>>>> > >> > >> >> >> >> > >
>>>>>> > >> > >> >> >> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote:
>>>>>> > >> > >> >> >> >> > > > Hiya,
>>>>>> > >> > >> >> >> >> > > >
>>>>>> > >> > >> >> >> >> > > > I am drafting a proposal for a way to enable
>>>>>> > services
>>>>>> > >> to
>>>>>> > >> > >> >> express
>>>>>> > >> > >> >> >> data
>>>>>> > >> > >> >> >> >> > > > compute operations to each other.
>>>>>> > >> > >> >> >> >> > > >
>>>>>> > >> > >> >> >> >> > > > However I think it'll be difficult to get buy-in
>>>>>> if
>>>>>> > the
>>>>>> > >> > only
>>>>>> > >> > >> >> >> >> > > representation
>>>>>> > >> > >> >> >> >> > > > for queries is as SQL strings.
>>>>>> > >> > >> >> >> >> > > >
>>>>>> > >> > >> >> >> >> > > > Is there any kind of lower-level API that can be
>>>>>> > used
>>>>>> > >> to
>>>>>> > >> > >> >> express
>>>>>> > >> > >> >> >> >> > > operations?
>>>>>> > >> > >> >> >> >> > > >
>>>>>> > >> > >> >> >> >> > > > IE instead of "SELECT name FROM user"
>>>>>> > >> > >> >> >> >> > > >
>>>>>> > >> > >> >> >> >> > > > A structured representation like:
>>>>>> > >> > >> >> >> >> > > > {
>>>>>> > >> > >> >> >> >> > > >   "op": "query",
>>>>>> > >> > >> >> >> >> > > >   "schema": "user",
>>>>>> > >> > >> >> >> >> > > >   "project": ["name"]
>>>>>> > >> > >> >> >> >> > > > }
>>>>>> > >> > >> >> >> >> > > >
>>>>>> > >> > >> >> >> >> > > > Or maybe this is a bad idea/doesn't make sense?
>>>>>> > >> > >> >> >> >> > > >
>>>>>> > >> > >> >> >> >> > > > Thank you =)
>>>>>> > >> > >> >> >> >> > >
>>>>>> > >> > >> >> >> >> >
>>>>>> > >> > >> >> >> >>
>>>>>> > >> > >> >> >> >>
>>>>>> > >> > >> >> >> >> --
>>>>>> > >> > >> >> >> >>
>>>>>> > >> > >> >> >> >> *James Duong*
>>>>>> > >> > >> >> >> >> Lead Software Developer
>>>>>> > >> > >> >> >> >> Bit Quill Technologies Inc.
>>>>>> > >> > >> >> >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>>>> > >> > >> >> >> >> https://www.bitquilltech.com
>>>>>> > >> > >> >> >> >>
>>>>>> > >> > >> >> >> >> This email message is for the sole use of the intended
>>>>>> > >> > >> recipient(s)
>>>>>> > >> > >> >> >> and may
>>>>>> > >> > >> >> >> >> contain confidential and privileged information.  Any
>>>>>> > >> > unauthorized
>>>>>> > >> > >> >> >> review,
>>>>>> > >> > >> >> >> >> use, disclosure, or distribution is prohibited.  If you
>>>>>> > are
>>>>>> > >> not
>>>>>> > >> > >> the
>>>>>> > >> > >> >> >> >> intended recipient, please contact the sender by reply
>>>>>> > email
>>>>>> > >> > and
>>>>>> > >> > >> >> >> destroy
>>>>>> > >> > >> >> >> >> all copies of the original message.  Thank you.
>>>>>> > >> > >> >> >> >>
>>>>>> > >> > >> >> >>
>>>>>> > >> > >> >> >
>>>>>> > >> > >> >>
>>>>>> > >> > >> >> --
>>>>>> > >> > >> >>
>>>>>> > >> > >> >> *James Duong*
>>>>>> > >> > >> >> Lead Software Developer
>>>>>> > >> > >> >> Bit Quill Technologies Inc.
>>>>>> > >> > >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>>>> > >> > >> >> https://www.bitquilltech.com
>>>>>> > >> > >> >>
>>>>>> > >> > >> >> This email message is for the sole use of the intended
>>>>>> > recipient(s)
>>>>>> > >> > and
>>>>>> > >> > >> may
>>>>>> > >> > >> >> contain confidential and privileged information.  Any
>>>>>> > unauthorized
>>>>>> > >> > >> review,
>>>>>> > >> > >> >> use, disclosure, or distribution is prohibited.  If you are
>>>>>> not
>>>>>> > the
>>>>>> > >> > >> >> intended recipient, please contact the sender by reply email
>>>>>> and
>>>>>> > >> > destroy
>>>>>> > >> > >> >> all copies of the original message.  Thank you.
>>>>>> > >> > >> >>
>>>>>> > >> > >>
>>>>>> > >> >
>>>>>> > >>
>>>>>> >
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> *James Duong*
>>>>>> Lead Software Developer
>>>>>> Bit Quill Technologies Inc.
>>>>>> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>>>> https://www.bitquilltech.com
>>>>>>
>>>>>> This email message is for the sole use of the intended recipient(s) and may
>>>>>> contain confidential and privileged information.  Any unauthorized review,
>>>>>> use, disclosure, or distribution is prohibited.  If you are not the
>>>>>> intended recipient, please contact the sender by reply email and destroy
>>>>>> all copies of the original message.  Thank you.
>>>>>>

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

Posted by David Li <li...@apache.org>.
I've updated the PR [1] and I believe everything is resolved. (I've fixed ARROW-17254, and changed the Protobuf definition to work around Protobuf's issues.) If there's no further comments, I'll start a vote in the coming days.

[1]: https://github.com/apache/arrow/pull/13492

Thanks,
David

On Fri, Aug 5, 2022, at 14:54, David Li wrote:
> I've added implementations for Java and C++ to the draft [1], including 
> integration tests, after addressing comments on the proposal itself 
> (thanks all for the comments). 
>
> One thing is, I might suggest punting on CancelQuery for now, or 
> changing how it's implemented, since embedding a message from 
> Flight.proto into FlightSql.proto interacts badly with Windows/DLLs 
> (protoc has poor support for embedding dllimport/dllexport macros).
>
> Otherwise I think things are ready, though we'll want to fix 
> ARROW-17254 [2] alongside it.
>
> [1]: https://github.com/apache/arrow/pull/13492
> [2]: https://issues.apache.org/jira/browse/ARROW-17254
>
> On Fri, Jul 1, 2022, at 14:34, David Li wrote:
>> I quickly drafted these out (sans implementation so far): 
>> https://github.com/apache/arrow/pull/13492
>>
>> On Thu, Jun 30, 2022, at 21:20, David Li wrote:
>>> Ah - somehow I didn't think of that. Yes, we should just implement it 
>>> in the same way prepared statements are already implemented.
>>>
>>> On Thu, Jun 30, 2022, at 19:42, Micah Kornfield wrote:
>>>>>
>>>>> It would also then be good to make explicit the statefulness of
>>>>> connections in Flight SQL. While that is sort of an obvious constraint, it
>>>>> is at odds with how gRPC is usually used (especially in the presence of
>>>>> load balancing).
>>>>
>>>>
>>>> I'm not sure I understand where the statefulness requirements come in?
>>>> Could you elaborate?  It seems that a transaction could be an opaque ID on
>>>> operations?
>>>>
>>>> On Thu, Jun 30, 2022 at 2:47 PM James Duong <ja...@bitquilltech.com.invalid>
>>>> wrote:
>>>>
>>>>> This is a bit of a tangent from the original discussion about
>>>>> Substrait integration.
>>>>>
>>>>> Flight SQL would definitely benefit from transaction RPC commands for
>>>>> building bridge drivers. I'm also wondering if there should be an RPC call
>>>>> to cancel a running query, as opposed to just having the client terminate
>>>>> streams. This would allow a multi-process application to cancel work across
>>>>> processes.
>>>>>
>>>>> On Thu, Jun 30, 2022 at 1:35 PM David Li <li...@apache.org> wrote:
>>>>>
>>>>> > Reviving this discussion: would people be interested in seeing a
>>>>> > sketched-out CommandSubstraitQuery et. al.?
>>>>> >
>>>>> > Additionally, while working on ADBC, I realized: does Flight SQL need
>>>>> > explicit Commit/Rollback commands? This would presumably be necessary if
>>>>> we
>>>>> > want to build ODBC/JDBC drivers on top, since those standards have
>>>>> explicit
>>>>> > commands, and Flight SQL doesn't have the luxury of a driver to issue
>>>>> > database-specific SQL to implement these.
>>>>> >
>>>>> > It would also then be good to make explicit the statefulness of
>>>>> > connections in Flight SQL. While that is sort of an obvious constraint,
>>>>> it
>>>>> > is at odds with how gRPC is usually used (especially in the presence of
>>>>> > load balancing).
>>>>> >
>>>>> > On Sun, Mar 6, 2022, at 14:44, Gavin Ray wrote:
>>>>> > > Got it, thank you David!
>>>>> > > I started prototyping the implementation last night, hopefully I will
>>>>> > make
>>>>> > > some good progress and have something basic functioning soon.
>>>>> > >
>>>>> > > RE: The metadata thing -- I think both Calcite and Teiid have solid
>>>>> > > interfaces for defining what capabilities a datasource has.
>>>>> > >
>>>>> >
>>>>> https://github.com/teiid/teiid/blob/8e9057a46be009d68b2d67701781f1f8c175baa7/api/src/main/java/org/teiid/translator/ExecutionFactory.java#L349-L1528
>>>>> > >
>>>>> > > It's probably not possible to make something universal, but it seems
>>>>> like
>>>>> > > you could get pretty close to most common functionality/capabilities
>>>>> > >
>>>>> > >
>>>>> > > On Sat, Mar 5, 2022 at 11:48 PM Kyle Porter <kylep@bitquilltech.com
>>>>> > .invalid>
>>>>> > > wrote:
>>>>> > >
>>>>> > >> Yes, we should, where possible, avoid any one of metadata. This is
>>>>> where
>>>>> > >> other standards fail in that applications must be custom built for
>>>>> each
>>>>> > >> data source, if we standardize the metadata then applications can at
>>>>> > least
>>>>> > >> be built to adapt.
>>>>> > >>
>>>>> > >> On Sat., Mar. 5, 2022, 6:54 p.m. David Li, <li...@apache.org>
>>>>> wrote:
>>>>> > >>
>>>>> > >> > Yes, GetSqlInfo reserves a range of metadata IDs for Flight SQL's
>>>>> > use, so
>>>>> > >> > the application can use others for its own purposes. That said if
>>>>> they
>>>>> > >> seem
>>>>> > >> > commonly applicable maybe we should try to standardize them.
>>>>> > >> >
>>>>> > >> > I think what you are doing should be reasonable. You may not need
>>>>> > _all_
>>>>> > >> of
>>>>> > >> > the capabilities in Flight SQL for this (e.g. all the various
>>>>> metadata
>>>>> > >> > calls, or prepared statements, perhaps) but I don't see why it
>>>>> > wouldn't
>>>>> > >> > work for you.
>>>>> > >> >
>>>>> > >> > On Fri, Mar 4, 2022, at 19:03, Gavin Ray wrote:
>>>>> > >> > > To touch on the question about supported features -- is it
>>>>> possible
>>>>> > to
>>>>> > >> > > advertise arbitrary/custom "capabilites" in GetSqlInfo?
>>>>> > >> > > Say that you want to represent some set of behaviors that
>>>>> FlightSQL
>>>>> > >> > > services can support.
>>>>> > >> > >
>>>>> > >> > > Stuff like "Supports grouping by multiple distinct aggregates",
>>>>> > >> "Supports
>>>>> > >> > > self-joins on aliased tables" etc
>>>>> > >> > > This is going to be unique to each implementation, but I couldn't
>>>>> > >> > determine
>>>>> > >> > > whether there was a way to express arbitrary capabilities
>>>>> > >> > >
>>>>> > >> > > Also, in case it's helpful I put together an ASCII diagram of what
>>>>> > I'm
>>>>> > >> > > trying to do with FlightSQL
>>>>> > >> > > If anyone has a moment, would appreciate input on whether it's
>>>>> > >> feasible/a
>>>>> > >> > > good idea
>>>>> > >> > >
>>>>> > >> > > https://pastebin.com/raw/VF2r0F3f
>>>>> > >> > >
>>>>> > >> > > Thank you =)
>>>>> > >> > >
>>>>> > >> > >
>>>>> > >> > > On Fri, Mar 4, 2022 at 2:37 PM David Li <li...@apache.org>
>>>>> > wrote:
>>>>> > >> > >
>>>>> > >> > >> We could also add say CommandSubstraitQuery as a distinct
>>>>> message,
>>>>> > and
>>>>> > >> > >> older servers would just reject it as an unknown request type.
>>>>> > >> > >>
>>>>> > >> > >> -David
>>>>> > >> > >>
>>>>> > >> > >> On Fri, Mar 4, 2022, at 17:01, Micah Kornfield wrote:
>>>>> > >> > >> >>
>>>>> > >> > >> >> 1. How does a server report that it supports each command
>>>>> type?
>>>>> > >> > Initial
>>>>> > >> > >> >> thought is a property in GetSqlInfo.
>>>>> > >> > >> >
>>>>> > >> > >> >
>>>>> > >> > >> > This sounds reasonable.
>>>>> > >> > >> >
>>>>> > >> > >> >
>>>>> > >> > >> >> What happens to client code written prior to changing the
>>>>> > command
>>>>> > >> > type
>>>>> > >> > >> >> to be a oneOf field? Same for servers.
>>>>> > >> > >> >
>>>>> > >> > >> >
>>>>> > >> > >> > It is transparent from older clients (I'm 99% sure the wire
>>>>> > protocol
>>>>> > >> > >> > doesn't change).  Servers is a little harder.  The one saving
>>>>> > grace
>>>>> > >> > is I
>>>>> > >> > >> > don't think an empty/not-present SQL string would be something
>>>>> > most
>>>>> > >> > >> servers
>>>>> > >> > >> > could handle, so they would probably error with something that
>>>>> > while
>>>>> > >> > >> > not-obvious would give a clue to the clients (but hopefully
>>>>> this
>>>>> > >> would
>>>>> > >> > >> be a
>>>>> > >> > >> > non-issue because the capabilities would be checked for clients
>>>>> > >> > wishing
>>>>> > >> > >> to
>>>>> > >> > >> > to use this feature first).
>>>>> > >> > >> >
>>>>> > >> > >> > -Micah
>>>>> > >> > >> >
>>>>> > >> > >> > On Fri, Mar 4, 2022 at 1:50 PM James Duong <
>>>>> > jamesd@bitquilltech.com
>>>>> > >> > >> .invalid>
>>>>> > >> > >> > wrote:
>>>>> > >> > >> >
>>>>> > >> > >> >> It sounds like an interesting and useful project to use
>>>>> > Subtstrait
>>>>> > >> > as an
>>>>> > >> > >> >> alternative to SQL strings.
>>>>> > >> > >> >>
>>>>> > >> > >> >> Important aspects to spec out are:
>>>>> > >> > >> >> 1. How does a server report that it supports each command
>>>>> type?
>>>>> > >> > Initial
>>>>> > >> > >> >> thought is a property in GetSqlInfo.
>>>>> > >> > >> >> 2. What happens to client code written prior to changing the
>>>>> > >> command
>>>>> > >> > >> type
>>>>> > >> > >> >> to be a oneOf field? Same for servers.
>>>>> > >> > >> >> More generally, how should backward compatibility work, and
>>>>> what
>>>>> > >> > should
>>>>> > >> > >> >> happen if a client sends an unsupported
>>>>> > >> > >> >> command type to a server.
>>>>> > >> > >> >> 3. Should inputs to catalog RPC calls also accept Substrait
>>>>> > >> > structures?
>>>>> > >> > >> >>
>>>>> > >> > >> >> On Thu, Mar 3, 2022 at 11:00 PM Gavin Ray <
>>>>> > ray.gavin97@gmail.com>
>>>>> > >> > >> wrote:
>>>>> > >> > >> >>
>>>>> > >> > >> >> > @James Duong <ja...@bitquilltech.com>
>>>>> > >> > >> >> >
>>>>> > >> > >> >> > You are absolutely right, I realized this and confirmed
>>>>> > whether
>>>>> > >> > this
>>>>> > >> > >> >> > would be possible with Jacques to double-check.
>>>>> > >> > >> >> > It would amount to what I might call "dollar-store
>>>>> Substrait."
>>>>> > >> It's
>>>>> > >> > >> not
>>>>> > >> > >> >> > elegant or a good solution, but definitely presents a good
>>>>> > >> > duct-tape
>>>>> > >> > >> hack
>>>>> > >> > >> >> > and is a crafty idea.
>>>>> > >> > >> >> >
>>>>> > >> > >> >> > I agree with Jacques -- when you think about FlightSQL, what
>>>>> > you
>>>>> > >> > are
>>>>> > >> > >> >> > attempting with a query isn't necessarily SQL, but a general
>>>>> > >> > >> data-compute
>>>>> > >> > >> >> > operation.
>>>>> > >> > >> >> > SQL just so happens to be a fairly universal way to express
>>>>> > them,
>>>>> > >> > >> with an
>>>>> > >> > >> >> > ANSI standard, but FlightSQL doesn't recognize any
>>>>> particular
>>>>> > >> > subset
>>>>> > >> > >> of
>>>>> > >> > >> >> it
>>>>> > >> > >> >> > and for all intents and purposes it doesn't matter what the
>>>>> > >> > operation
>>>>> > >> > >> >> > string contains.
>>>>> > >> > >> >> >
>>>>> > >> > >> >> > Substrait would make a fantastic logical next-feature
>>>>> because
>>>>> > >> it's
>>>>> > >> > >> >> > targeted as a specification for expressing relational
>>>>> algebra
>>>>> > and
>>>>> > >> > >> >> > data-compute operations
>>>>> > >> > >> >> > This more-or-less equates to SQL strings (in my mind at
>>>>> least)
>>>>> > >> > with a
>>>>> > >> > >> >> much
>>>>> > >> > >> >> > better toolkit and Dev UX. If there is anything I can do to
>>>>> > help
>>>>> > >> > move
>>>>> > >> > >> >> this
>>>>> > >> > >> >> > forward, please let me know because I am extremely motivated
>>>>> > to
>>>>> > >> do
>>>>> > >> > so.
>>>>> > >> > >> >> >
>>>>> > >> > >> >> > @David Li <gi...@lidavidm.me>
>>>>> > >> > >> >> >
>>>>> > >> > >> >> > Also agreed. Substrait is put together by folks much smarter
>>>>> > than
>>>>> > >> > >> myself,
>>>>> > >> > >> >> > and if I had to hedge my bets, I'd put money on it being the
>>>>> > >> > future of
>>>>> > >> > >> >> > data-compute interop.
>>>>> > >> > >> >> > I would love nothing more than to adopt this technology and
>>>>> > push
>>>>> > >> it
>>>>> > >> > >> >> along.
>>>>> > >> > >> >> >
>>>>> > >> > >> >> > Your project does sound interesting - basically, it sounds
>>>>> > like a
>>>>> > >> > >> tabular
>>>>> > >> > >> >> >> data storage service with query pushdown?
>>>>> > >> > >> >> >>
>>>>> > >> > >> >> >
>>>>> > >> > >> >> > Yeah this is more or less the details of it (my personal
>>>>> > email,
>>>>> > >> > with
>>>>> > >> > >> >> > discretion assumed, is always open)
>>>>> > >> > >> >> >
>>>>> > >> > >> >> > Imagine an environment where a backend wants to advertise
>>>>> some
>>>>> > >> > kind of
>>>>> > >> > >> >> > schema/data catalog
>>>>> > >> > >> >> >
>>>>> > >> > >> >> > And then a central service introspects these backends, and
>>>>> > >> > dynamically
>>>>> > >> > >> >> > generates an API from the data catalogues/schemas, where
>>>>> > requests
>>>>> > >> > get
>>>>> > >> > >> >> > proxied to the underlying backend service for each schema to
>>>>> > >> > actually
>>>>> > >> > >> be
>>>>> > >> > >> >> > executed
>>>>> > >> > >> >> >
>>>>> > >> > >> >> > In text, the flow would look something like:
>>>>> > >> > >> >> >
>>>>> > >> > >> >> >
>>>>> > >> > >> >> >        <----> Data Provider Backend 0
>>>>> > >> > >> >> > Client <-----> Central Service <---> Generated API <---->
>>>>> > >> > >> Data-Provider
>>>>> > >> > >> >> > Backend 1
>>>>> > >> > >> >> >
>>>>> > >> > >> >> >        <----> Data Provider Backend 2
>>>>> > >> > >> >> >
>>>>> > >> > >> >> >
>>>>> > >> > >> >> >
>>>>> > >> > >> >> > On Thu, Mar 3, 2022 at 5:52 PM David Li <
>>>>> lidavidm@apache.org>
>>>>> > >> > wrote:
>>>>> > >> > >> >> >
>>>>> > >> > >> >> >> Gavin, thanks for sharing. I'm not so sure you'll find an
>>>>> > >> > >> alternative to
>>>>> > >> > >> >> >> Substrait, at least one that isn't even more nascent or one
>>>>> > >> that's
>>>>> > >> > >> very
>>>>> > >> > >> >> >> tied to a particular language, so perhaps it might be
>>>>> better
>>>>> > to
>>>>> > >> > get
>>>>> > >> > >> >> >> involved in Substrait and see if it suits your needs?
>>>>> > >> Convincing a
>>>>> > >> > >> team
>>>>> > >> > >> >> to
>>>>> > >> > >> >> >> try something new can be hard, though, and it is somewhat
>>>>> of
>>>>> > a
>>>>> > >> > moving
>>>>> > >> > >> >> >> target - but Flight SQL is in a similar spot, I think, as
>>>>> > it's
>>>>> > >> > still
>>>>> > >> > >> >> >> getting enhancements.
>>>>> > >> > >> >> >>
>>>>> > >> > >> >> >> Your project does sound interesting - basically, it sounds
>>>>> > like
>>>>> > >> a
>>>>> > >> > >> >> tabular
>>>>> > >> > >> >> >> data storage service with query pushdown?
>>>>> > >> > >> >> >>
>>>>> > >> > >> >> >> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote:
>>>>> > >> > >> >> >> > James, I agree that you could use JSON but that feels a
>>>>> bit
>>>>> > >> > hacky
>>>>> > >> > >> >> >> > (mis-use
>>>>> > >> > >> >> >> > of the paradigm). Instead, I'd really like to do
>>>>> something
>>>>> > >> like
>>>>> > >> > >> David
>>>>> > >> > >> >> is
>>>>> > >> > >> >> >> > suggesting: support Substrait as an alternative to a SQL
>>>>> > >> string.
>>>>> > >> > >> >> >> > Something like this:
>>>>> > >> > >> >> >> >
>>>>> > >> > >> >> >>
>>>>> > >> > >> >>
>>>>> > >> > >>
>>>>> > >> >
>>>>> > >>
>>>>> >
>>>>> https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc
>>>>> > >> > >> >> >> >
>>>>> > >> > >> >> >> > It would be great if someone wanted to pick this up. It
>>>>> > would
>>>>> > >> > be a
>>>>> > >> > >> >> nice
>>>>> > >> > >> >> >> > enhancement to FlightSQL (and provide a structured way to
>>>>> > >> > express
>>>>> > >> > >> >> >> > operations).
>>>>> > >> > >> >> >> >
>>>>> > >> > >> >> >> >
>>>>> > >> > >> >> >> >
>>>>> > >> > >> >> >> > On Thu, Mar 3, 2022 at 4:56 PM James Duong <
>>>>> > >> > >> jamesd@bitquilltech.com
>>>>> > >> > >> >> >> .invalid>
>>>>> > >> > >> >> >> > wrote:
>>>>> > >> > >> >> >> >
>>>>> > >> > >> >> >> >> In the same way that you could write an ODBC driver that
>>>>> > >> takes
>>>>> > >> > in
>>>>> > >> > >> >> text
>>>>> > >> > >> >> >> >> that's not SQL, you could write a Flight SQL server that
>>>>> > >> takes
>>>>> > >> > in
>>>>> > >> > >> >> text
>>>>> > >> > >> >> >> >> that's JSON.
>>>>> > >> > >> >> >> >> Flight SQL doesn't parse the query, so you could create
>>>>> > >> > commands
>>>>> > >> > >> that
>>>>> > >> > >> >> >> are
>>>>> > >> > >> >> >> >> just JSON text.
>>>>> > >> > >> >> >> >>
>>>>> > >> > >> >> >> >> Is that the only bit you need, Gavin?
>>>>> > >> > >> >> >> >>
>>>>> > >> > >> >> >> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray <
>>>>> > >> > ray.gavin97@gmail.com>
>>>>> > >> > >> >> >> wrote:
>>>>> > >> > >> >> >> >>
>>>>> > >> > >> >> >> >> > I am enthusiastic about Substrait and have followed
>>>>> it's
>>>>> > >> > >> progress
>>>>> > >> > >> >> >> eagerly
>>>>> > >> > >> >> >> >> > =D
>>>>> > >> > >> >> >> >> >
>>>>> > >> > >> >> >> >> > When I presented it as a tentative option, there were
>>>>> > >> > >> reservations
>>>>> > >> > >> >> >> >> because
>>>>> > >> > >> >> >> >> > of the project/spec being young and the functionality
>>>>> > still
>>>>> > >> > >> being
>>>>> > >> > >> >> >> >> > fleshed out.
>>>>> > >> > >> >> >> >> > I think if I were having this conversation in say,
>>>>> 8-16
>>>>> > >> > months,
>>>>> > >> > >> it
>>>>> > >> > >> >> >> would
>>>>> > >> > >> >> >> >> > have been an easy choice, no doubt.
>>>>> > >> > >> >> >> >> >
>>>>> > >> > >> >> >> >> > On a public mailing list (and I can share more details
>>>>> > in
>>>>> > >> > >> private
>>>>> > >> > >> >> if
>>>>> > >> > >> >> >> >> you're
>>>>> > >> > >> >> >> >> > curious), the gist of it is this:
>>>>> > >> > >> >> >> >> >
>>>>> > >> > >> >> >> >> > Some well-defined/backed-by-mature tech solution for
>>>>> > >> > expressing
>>>>> > >> > >> >> data
>>>>> > >> > >> >> >> >> > compute operations between services would be a useful
>>>>> > thing
>>>>> > >> > to
>>>>> > >> > >> have
>>>>> > >> > >> >> >> >> > (Especially if it's language-agnostic)
>>>>> > >> > >> >> >> >> >
>>>>> > >> > >> >> >> >> > The goal is for an "implementing service" to have:
>>>>> > >> > >> >> >> >> > - An introspectable schema (IE, "describe yourself to
>>>>> > me")
>>>>> > >> > >> >> >> >> > - A query/operation execution endpoint (IE: "perform
>>>>> > this
>>>>> > >> > >> operation
>>>>> > >> > >> >> >> on
>>>>> > >> > >> >> >> >> your
>>>>> > >> > >> >> >> >> > data")
>>>>> > >> > >> >> >> >> >
>>>>> > >> > >> >> >> >> > With FlightSQL this is possible I believe, but it
>>>>> > requires
>>>>> > >> > the
>>>>> > >> > >> >> >> operation
>>>>> > >> > >> >> >> >> to
>>>>> > >> > >> >> >> >> > be expressed as a SQL string which isn't ideal.
>>>>> > >> > >> >> >> >> >
>>>>> > >> > >> >> >> >> > Working with some programmatic, structured object that
>>>>> > has
>>>>> > >> > the
>>>>> > >> > >> same
>>>>> > >> > >> >> >> >> > semantics ("Logical Plan", or whatnot) as a SQL query
>>>>> > would
>>>>> > >> > >> have,
>>>>> > >> > >> >> >> would
>>>>> > >> > >> >> >> >> be
>>>>> > >> > >> >> >> >> > a better experience
>>>>> > >> > >> >> >> >> > (Jacques is on to something here!)
>>>>> > >> > >> >> >> >> >
>>>>> > >> > >> >> >> >> > This interface between services would be somewhat the
>>>>> > >> > >> equivalent of
>>>>> > >> > >> >> >> an
>>>>> > >> > >> >> >> >> > "SDK", so it would be nice to have a strongly-typed
>>>>> > library
>>>>> > >> > for
>>>>> > >> > >> >> >> >> expressing
>>>>> > >> > >> >> >> >> > and building-up query/data-compute ops.
>>>>> > >> > >> >> >> >> >
>>>>> > >> > >> >> >> >> >
>>>>> > >> > >> >> >> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li <
>>>>> > >> lidavidm@apache.org
>>>>> > >> > >
>>>>> > >> > >> >> wrote:
>>>>> > >> > >> >> >> >> >
>>>>> > >> > >> >> >> >> > > You probably want Substrait: https://substrait.io/
>>>>> > >> > >> >> >> >> > >
>>>>> > >> > >> >> >> >> > > Which is being worked on by several people,
>>>>> including
>>>>> > >> Arrow
>>>>> > >> > >> >> >> community
>>>>> > >> > >> >> >> >> > > members.
>>>>> > >> > >> >> >> >> > >
>>>>> > >> > >> >> >> >> > > It might be interesting to generalize Flight SQL to
>>>>> > >> include
>>>>> > >> > >> >> >> support for
>>>>> > >> > >> >> >> >> > > Substrait. I'm curious what your application, if
>>>>> > you're
>>>>> > >> > able
>>>>> > >> > >> to
>>>>> > >> > >> >> >> share
>>>>> > >> > >> >> >> >> > more.
>>>>> > >> > >> >> >> >> > >
>>>>> > >> > >> >> >> >> > > -David
>>>>> > >> > >> >> >> >> > >
>>>>> > >> > >> >> >> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote:
>>>>> > >> > >> >> >> >> > > > Hiya,
>>>>> > >> > >> >> >> >> > > >
>>>>> > >> > >> >> >> >> > > > I am drafting a proposal for a way to enable
>>>>> > services
>>>>> > >> to
>>>>> > >> > >> >> express
>>>>> > >> > >> >> >> data
>>>>> > >> > >> >> >> >> > > > compute operations to each other.
>>>>> > >> > >> >> >> >> > > >
>>>>> > >> > >> >> >> >> > > > However I think it'll be difficult to get buy-in
>>>>> if
>>>>> > the
>>>>> > >> > only
>>>>> > >> > >> >> >> >> > > representation
>>>>> > >> > >> >> >> >> > > > for queries is as SQL strings.
>>>>> > >> > >> >> >> >> > > >
>>>>> > >> > >> >> >> >> > > > Is there any kind of lower-level API that can be
>>>>> > used
>>>>> > >> to
>>>>> > >> > >> >> express
>>>>> > >> > >> >> >> >> > > operations?
>>>>> > >> > >> >> >> >> > > >
>>>>> > >> > >> >> >> >> > > > IE instead of "SELECT name FROM user"
>>>>> > >> > >> >> >> >> > > >
>>>>> > >> > >> >> >> >> > > > A structured representation like:
>>>>> > >> > >> >> >> >> > > > {
>>>>> > >> > >> >> >> >> > > >   "op": "query",
>>>>> > >> > >> >> >> >> > > >   "schema": "user",
>>>>> > >> > >> >> >> >> > > >   "project": ["name"]
>>>>> > >> > >> >> >> >> > > > }
>>>>> > >> > >> >> >> >> > > >
>>>>> > >> > >> >> >> >> > > > Or maybe this is a bad idea/doesn't make sense?
>>>>> > >> > >> >> >> >> > > >
>>>>> > >> > >> >> >> >> > > > Thank you =)
>>>>> > >> > >> >> >> >> > >
>>>>> > >> > >> >> >> >> >
>>>>> > >> > >> >> >> >>
>>>>> > >> > >> >> >> >>
>>>>> > >> > >> >> >> >> --
>>>>> > >> > >> >> >> >>
>>>>> > >> > >> >> >> >> *James Duong*
>>>>> > >> > >> >> >> >> Lead Software Developer
>>>>> > >> > >> >> >> >> Bit Quill Technologies Inc.
>>>>> > >> > >> >> >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>>> > >> > >> >> >> >> https://www.bitquilltech.com
>>>>> > >> > >> >> >> >>
>>>>> > >> > >> >> >> >> This email message is for the sole use of the intended
>>>>> > >> > >> recipient(s)
>>>>> > >> > >> >> >> and may
>>>>> > >> > >> >> >> >> contain confidential and privileged information.  Any
>>>>> > >> > unauthorized
>>>>> > >> > >> >> >> review,
>>>>> > >> > >> >> >> >> use, disclosure, or distribution is prohibited.  If you
>>>>> > are
>>>>> > >> not
>>>>> > >> > >> the
>>>>> > >> > >> >> >> >> intended recipient, please contact the sender by reply
>>>>> > email
>>>>> > >> > and
>>>>> > >> > >> >> >> destroy
>>>>> > >> > >> >> >> >> all copies of the original message.  Thank you.
>>>>> > >> > >> >> >> >>
>>>>> > >> > >> >> >>
>>>>> > >> > >> >> >
>>>>> > >> > >> >>
>>>>> > >> > >> >> --
>>>>> > >> > >> >>
>>>>> > >> > >> >> *James Duong*
>>>>> > >> > >> >> Lead Software Developer
>>>>> > >> > >> >> Bit Quill Technologies Inc.
>>>>> > >> > >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>>> > >> > >> >> https://www.bitquilltech.com
>>>>> > >> > >> >>
>>>>> > >> > >> >> This email message is for the sole use of the intended
>>>>> > recipient(s)
>>>>> > >> > and
>>>>> > >> > >> may
>>>>> > >> > >> >> contain confidential and privileged information.  Any
>>>>> > unauthorized
>>>>> > >> > >> review,
>>>>> > >> > >> >> use, disclosure, or distribution is prohibited.  If you are
>>>>> not
>>>>> > the
>>>>> > >> > >> >> intended recipient, please contact the sender by reply email
>>>>> and
>>>>> > >> > destroy
>>>>> > >> > >> >> all copies of the original message.  Thank you.
>>>>> > >> > >> >>
>>>>> > >> > >>
>>>>> > >> >
>>>>> > >>
>>>>> >
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *James Duong*
>>>>> Lead Software Developer
>>>>> Bit Quill Technologies Inc.
>>>>> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>>> https://www.bitquilltech.com
>>>>>
>>>>> This email message is for the sole use of the intended recipient(s) and may
>>>>> contain confidential and privileged information.  Any unauthorized review,
>>>>> use, disclosure, or distribution is prohibited.  If you are not the
>>>>> intended recipient, please contact the sender by reply email and destroy
>>>>> all copies of the original message.  Thank you.
>>>>>

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

Posted by David Li <li...@apache.org>.
I've added implementations for Java and C++ to the draft [1], including integration tests, after addressing comments on the proposal itself (thanks all for the comments). 

One thing is, I might suggest punting on CancelQuery for now, or changing how it's implemented, since embedding a message from Flight.proto into FlightSql.proto interacts badly with Windows/DLLs (protoc has poor support for embedding dllimport/dllexport macros).

Otherwise I think things are ready, though we'll want to fix ARROW-17254 [2] alongside it.

[1]: https://github.com/apache/arrow/pull/13492
[2]: https://issues.apache.org/jira/browse/ARROW-17254

On Fri, Jul 1, 2022, at 14:34, David Li wrote:
> I quickly drafted these out (sans implementation so far): 
> https://github.com/apache/arrow/pull/13492
>
> On Thu, Jun 30, 2022, at 21:20, David Li wrote:
>> Ah - somehow I didn't think of that. Yes, we should just implement it 
>> in the same way prepared statements are already implemented.
>>
>> On Thu, Jun 30, 2022, at 19:42, Micah Kornfield wrote:
>>>>
>>>> It would also then be good to make explicit the statefulness of
>>>> connections in Flight SQL. While that is sort of an obvious constraint, it
>>>> is at odds with how gRPC is usually used (especially in the presence of
>>>> load balancing).
>>>
>>>
>>> I'm not sure I understand where the statefulness requirements come in?
>>> Could you elaborate?  It seems that a transaction could be an opaque ID on
>>> operations?
>>>
>>> On Thu, Jun 30, 2022 at 2:47 PM James Duong <ja...@bitquilltech.com.invalid>
>>> wrote:
>>>
>>>> This is a bit of a tangent from the original discussion about
>>>> Substrait integration.
>>>>
>>>> Flight SQL would definitely benefit from transaction RPC commands for
>>>> building bridge drivers. I'm also wondering if there should be an RPC call
>>>> to cancel a running query, as opposed to just having the client terminate
>>>> streams. This would allow a multi-process application to cancel work across
>>>> processes.
>>>>
>>>> On Thu, Jun 30, 2022 at 1:35 PM David Li <li...@apache.org> wrote:
>>>>
>>>> > Reviving this discussion: would people be interested in seeing a
>>>> > sketched-out CommandSubstraitQuery et. al.?
>>>> >
>>>> > Additionally, while working on ADBC, I realized: does Flight SQL need
>>>> > explicit Commit/Rollback commands? This would presumably be necessary if
>>>> we
>>>> > want to build ODBC/JDBC drivers on top, since those standards have
>>>> explicit
>>>> > commands, and Flight SQL doesn't have the luxury of a driver to issue
>>>> > database-specific SQL to implement these.
>>>> >
>>>> > It would also then be good to make explicit the statefulness of
>>>> > connections in Flight SQL. While that is sort of an obvious constraint,
>>>> it
>>>> > is at odds with how gRPC is usually used (especially in the presence of
>>>> > load balancing).
>>>> >
>>>> > On Sun, Mar 6, 2022, at 14:44, Gavin Ray wrote:
>>>> > > Got it, thank you David!
>>>> > > I started prototyping the implementation last night, hopefully I will
>>>> > make
>>>> > > some good progress and have something basic functioning soon.
>>>> > >
>>>> > > RE: The metadata thing -- I think both Calcite and Teiid have solid
>>>> > > interfaces for defining what capabilities a datasource has.
>>>> > >
>>>> >
>>>> https://github.com/teiid/teiid/blob/8e9057a46be009d68b2d67701781f1f8c175baa7/api/src/main/java/org/teiid/translator/ExecutionFactory.java#L349-L1528
>>>> > >
>>>> > > It's probably not possible to make something universal, but it seems
>>>> like
>>>> > > you could get pretty close to most common functionality/capabilities
>>>> > >
>>>> > >
>>>> > > On Sat, Mar 5, 2022 at 11:48 PM Kyle Porter <kylep@bitquilltech.com
>>>> > .invalid>
>>>> > > wrote:
>>>> > >
>>>> > >> Yes, we should, where possible, avoid any one of metadata. This is
>>>> where
>>>> > >> other standards fail in that applications must be custom built for
>>>> each
>>>> > >> data source, if we standardize the metadata then applications can at
>>>> > least
>>>> > >> be built to adapt.
>>>> > >>
>>>> > >> On Sat., Mar. 5, 2022, 6:54 p.m. David Li, <li...@apache.org>
>>>> wrote:
>>>> > >>
>>>> > >> > Yes, GetSqlInfo reserves a range of metadata IDs for Flight SQL's
>>>> > use, so
>>>> > >> > the application can use others for its own purposes. That said if
>>>> they
>>>> > >> seem
>>>> > >> > commonly applicable maybe we should try to standardize them.
>>>> > >> >
>>>> > >> > I think what you are doing should be reasonable. You may not need
>>>> > _all_
>>>> > >> of
>>>> > >> > the capabilities in Flight SQL for this (e.g. all the various
>>>> metadata
>>>> > >> > calls, or prepared statements, perhaps) but I don't see why it
>>>> > wouldn't
>>>> > >> > work for you.
>>>> > >> >
>>>> > >> > On Fri, Mar 4, 2022, at 19:03, Gavin Ray wrote:
>>>> > >> > > To touch on the question about supported features -- is it
>>>> possible
>>>> > to
>>>> > >> > > advertise arbitrary/custom "capabilites" in GetSqlInfo?
>>>> > >> > > Say that you want to represent some set of behaviors that
>>>> FlightSQL
>>>> > >> > > services can support.
>>>> > >> > >
>>>> > >> > > Stuff like "Supports grouping by multiple distinct aggregates",
>>>> > >> "Supports
>>>> > >> > > self-joins on aliased tables" etc
>>>> > >> > > This is going to be unique to each implementation, but I couldn't
>>>> > >> > determine
>>>> > >> > > whether there was a way to express arbitrary capabilities
>>>> > >> > >
>>>> > >> > > Also, in case it's helpful I put together an ASCII diagram of what
>>>> > I'm
>>>> > >> > > trying to do with FlightSQL
>>>> > >> > > If anyone has a moment, would appreciate input on whether it's
>>>> > >> feasible/a
>>>> > >> > > good idea
>>>> > >> > >
>>>> > >> > > https://pastebin.com/raw/VF2r0F3f
>>>> > >> > >
>>>> > >> > > Thank you =)
>>>> > >> > >
>>>> > >> > >
>>>> > >> > > On Fri, Mar 4, 2022 at 2:37 PM David Li <li...@apache.org>
>>>> > wrote:
>>>> > >> > >
>>>> > >> > >> We could also add say CommandSubstraitQuery as a distinct
>>>> message,
>>>> > and
>>>> > >> > >> older servers would just reject it as an unknown request type.
>>>> > >> > >>
>>>> > >> > >> -David
>>>> > >> > >>
>>>> > >> > >> On Fri, Mar 4, 2022, at 17:01, Micah Kornfield wrote:
>>>> > >> > >> >>
>>>> > >> > >> >> 1. How does a server report that it supports each command
>>>> type?
>>>> > >> > Initial
>>>> > >> > >> >> thought is a property in GetSqlInfo.
>>>> > >> > >> >
>>>> > >> > >> >
>>>> > >> > >> > This sounds reasonable.
>>>> > >> > >> >
>>>> > >> > >> >
>>>> > >> > >> >> What happens to client code written prior to changing the
>>>> > command
>>>> > >> > type
>>>> > >> > >> >> to be a oneOf field? Same for servers.
>>>> > >> > >> >
>>>> > >> > >> >
>>>> > >> > >> > It is transparent from older clients (I'm 99% sure the wire
>>>> > protocol
>>>> > >> > >> > doesn't change).  Servers is a little harder.  The one saving
>>>> > grace
>>>> > >> > is I
>>>> > >> > >> > don't think an empty/not-present SQL string would be something
>>>> > most
>>>> > >> > >> servers
>>>> > >> > >> > could handle, so they would probably error with something that
>>>> > while
>>>> > >> > >> > not-obvious would give a clue to the clients (but hopefully
>>>> this
>>>> > >> would
>>>> > >> > >> be a
>>>> > >> > >> > non-issue because the capabilities would be checked for clients
>>>> > >> > wishing
>>>> > >> > >> to
>>>> > >> > >> > to use this feature first).
>>>> > >> > >> >
>>>> > >> > >> > -Micah
>>>> > >> > >> >
>>>> > >> > >> > On Fri, Mar 4, 2022 at 1:50 PM James Duong <
>>>> > jamesd@bitquilltech.com
>>>> > >> > >> .invalid>
>>>> > >> > >> > wrote:
>>>> > >> > >> >
>>>> > >> > >> >> It sounds like an interesting and useful project to use
>>>> > Subtstrait
>>>> > >> > as an
>>>> > >> > >> >> alternative to SQL strings.
>>>> > >> > >> >>
>>>> > >> > >> >> Important aspects to spec out are:
>>>> > >> > >> >> 1. How does a server report that it supports each command
>>>> type?
>>>> > >> > Initial
>>>> > >> > >> >> thought is a property in GetSqlInfo.
>>>> > >> > >> >> 2. What happens to client code written prior to changing the
>>>> > >> command
>>>> > >> > >> type
>>>> > >> > >> >> to be a oneOf field? Same for servers.
>>>> > >> > >> >> More generally, how should backward compatibility work, and
>>>> what
>>>> > >> > should
>>>> > >> > >> >> happen if a client sends an unsupported
>>>> > >> > >> >> command type to a server.
>>>> > >> > >> >> 3. Should inputs to catalog RPC calls also accept Substrait
>>>> > >> > structures?
>>>> > >> > >> >>
>>>> > >> > >> >> On Thu, Mar 3, 2022 at 11:00 PM Gavin Ray <
>>>> > ray.gavin97@gmail.com>
>>>> > >> > >> wrote:
>>>> > >> > >> >>
>>>> > >> > >> >> > @James Duong <ja...@bitquilltech.com>
>>>> > >> > >> >> >
>>>> > >> > >> >> > You are absolutely right, I realized this and confirmed
>>>> > whether
>>>> > >> > this
>>>> > >> > >> >> > would be possible with Jacques to double-check.
>>>> > >> > >> >> > It would amount to what I might call "dollar-store
>>>> Substrait."
>>>> > >> It's
>>>> > >> > >> not
>>>> > >> > >> >> > elegant or a good solution, but definitely presents a good
>>>> > >> > duct-tape
>>>> > >> > >> hack
>>>> > >> > >> >> > and is a crafty idea.
>>>> > >> > >> >> >
>>>> > >> > >> >> > I agree with Jacques -- when you think about FlightSQL, what
>>>> > you
>>>> > >> > are
>>>> > >> > >> >> > attempting with a query isn't necessarily SQL, but a general
>>>> > >> > >> data-compute
>>>> > >> > >> >> > operation.
>>>> > >> > >> >> > SQL just so happens to be a fairly universal way to express
>>>> > them,
>>>> > >> > >> with an
>>>> > >> > >> >> > ANSI standard, but FlightSQL doesn't recognize any
>>>> particular
>>>> > >> > subset
>>>> > >> > >> of
>>>> > >> > >> >> it
>>>> > >> > >> >> > and for all intents and purposes it doesn't matter what the
>>>> > >> > operation
>>>> > >> > >> >> > string contains.
>>>> > >> > >> >> >
>>>> > >> > >> >> > Substrait would make a fantastic logical next-feature
>>>> because
>>>> > >> it's
>>>> > >> > >> >> > targeted as a specification for expressing relational
>>>> algebra
>>>> > and
>>>> > >> > >> >> > data-compute operations
>>>> > >> > >> >> > This more-or-less equates to SQL strings (in my mind at
>>>> least)
>>>> > >> > with a
>>>> > >> > >> >> much
>>>> > >> > >> >> > better toolkit and Dev UX. If there is anything I can do to
>>>> > help
>>>> > >> > move
>>>> > >> > >> >> this
>>>> > >> > >> >> > forward, please let me know because I am extremely motivated
>>>> > to
>>>> > >> do
>>>> > >> > so.
>>>> > >> > >> >> >
>>>> > >> > >> >> > @David Li <gi...@lidavidm.me>
>>>> > >> > >> >> >
>>>> > >> > >> >> > Also agreed. Substrait is put together by folks much smarter
>>>> > than
>>>> > >> > >> myself,
>>>> > >> > >> >> > and if I had to hedge my bets, I'd put money on it being the
>>>> > >> > future of
>>>> > >> > >> >> > data-compute interop.
>>>> > >> > >> >> > I would love nothing more than to adopt this technology and
>>>> > push
>>>> > >> it
>>>> > >> > >> >> along.
>>>> > >> > >> >> >
>>>> > >> > >> >> > Your project does sound interesting - basically, it sounds
>>>> > like a
>>>> > >> > >> tabular
>>>> > >> > >> >> >> data storage service with query pushdown?
>>>> > >> > >> >> >>
>>>> > >> > >> >> >
>>>> > >> > >> >> > Yeah this is more or less the details of it (my personal
>>>> > email,
>>>> > >> > with
>>>> > >> > >> >> > discretion assumed, is always open)
>>>> > >> > >> >> >
>>>> > >> > >> >> > Imagine an environment where a backend wants to advertise
>>>> some
>>>> > >> > kind of
>>>> > >> > >> >> > schema/data catalog
>>>> > >> > >> >> >
>>>> > >> > >> >> > And then a central service introspects these backends, and
>>>> > >> > dynamically
>>>> > >> > >> >> > generates an API from the data catalogues/schemas, where
>>>> > requests
>>>> > >> > get
>>>> > >> > >> >> > proxied to the underlying backend service for each schema to
>>>> > >> > actually
>>>> > >> > >> be
>>>> > >> > >> >> > executed
>>>> > >> > >> >> >
>>>> > >> > >> >> > In text, the flow would look something like:
>>>> > >> > >> >> >
>>>> > >> > >> >> >
>>>> > >> > >> >> >        <----> Data Provider Backend 0
>>>> > >> > >> >> > Client <-----> Central Service <---> Generated API <---->
>>>> > >> > >> Data-Provider
>>>> > >> > >> >> > Backend 1
>>>> > >> > >> >> >
>>>> > >> > >> >> >        <----> Data Provider Backend 2
>>>> > >> > >> >> >
>>>> > >> > >> >> >
>>>> > >> > >> >> >
>>>> > >> > >> >> > On Thu, Mar 3, 2022 at 5:52 PM David Li <
>>>> lidavidm@apache.org>
>>>> > >> > wrote:
>>>> > >> > >> >> >
>>>> > >> > >> >> >> Gavin, thanks for sharing. I'm not so sure you'll find an
>>>> > >> > >> alternative to
>>>> > >> > >> >> >> Substrait, at least one that isn't even more nascent or one
>>>> > >> that's
>>>> > >> > >> very
>>>> > >> > >> >> >> tied to a particular language, so perhaps it might be
>>>> better
>>>> > to
>>>> > >> > get
>>>> > >> > >> >> >> involved in Substrait and see if it suits your needs?
>>>> > >> Convincing a
>>>> > >> > >> team
>>>> > >> > >> >> to
>>>> > >> > >> >> >> try something new can be hard, though, and it is somewhat
>>>> of
>>>> > a
>>>> > >> > moving
>>>> > >> > >> >> >> target - but Flight SQL is in a similar spot, I think, as
>>>> > it's
>>>> > >> > still
>>>> > >> > >> >> >> getting enhancements.
>>>> > >> > >> >> >>
>>>> > >> > >> >> >> Your project does sound interesting - basically, it sounds
>>>> > like
>>>> > >> a
>>>> > >> > >> >> tabular
>>>> > >> > >> >> >> data storage service with query pushdown?
>>>> > >> > >> >> >>
>>>> > >> > >> >> >> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote:
>>>> > >> > >> >> >> > James, I agree that you could use JSON but that feels a
>>>> bit
>>>> > >> > hacky
>>>> > >> > >> >> >> > (mis-use
>>>> > >> > >> >> >> > of the paradigm). Instead, I'd really like to do
>>>> something
>>>> > >> like
>>>> > >> > >> David
>>>> > >> > >> >> is
>>>> > >> > >> >> >> > suggesting: support Substrait as an alternative to a SQL
>>>> > >> string.
>>>> > >> > >> >> >> > Something like this:
>>>> > >> > >> >> >> >
>>>> > >> > >> >> >>
>>>> > >> > >> >>
>>>> > >> > >>
>>>> > >> >
>>>> > >>
>>>> >
>>>> https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc
>>>> > >> > >> >> >> >
>>>> > >> > >> >> >> > It would be great if someone wanted to pick this up. It
>>>> > would
>>>> > >> > be a
>>>> > >> > >> >> nice
>>>> > >> > >> >> >> > enhancement to FlightSQL (and provide a structured way to
>>>> > >> > express
>>>> > >> > >> >> >> > operations).
>>>> > >> > >> >> >> >
>>>> > >> > >> >> >> >
>>>> > >> > >> >> >> >
>>>> > >> > >> >> >> > On Thu, Mar 3, 2022 at 4:56 PM James Duong <
>>>> > >> > >> jamesd@bitquilltech.com
>>>> > >> > >> >> >> .invalid>
>>>> > >> > >> >> >> > wrote:
>>>> > >> > >> >> >> >
>>>> > >> > >> >> >> >> In the same way that you could write an ODBC driver that
>>>> > >> takes
>>>> > >> > in
>>>> > >> > >> >> text
>>>> > >> > >> >> >> >> that's not SQL, you could write a Flight SQL server that
>>>> > >> takes
>>>> > >> > in
>>>> > >> > >> >> text
>>>> > >> > >> >> >> >> that's JSON.
>>>> > >> > >> >> >> >> Flight SQL doesn't parse the query, so you could create
>>>> > >> > commands
>>>> > >> > >> that
>>>> > >> > >> >> >> are
>>>> > >> > >> >> >> >> just JSON text.
>>>> > >> > >> >> >> >>
>>>> > >> > >> >> >> >> Is that the only bit you need, Gavin?
>>>> > >> > >> >> >> >>
>>>> > >> > >> >> >> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray <
>>>> > >> > ray.gavin97@gmail.com>
>>>> > >> > >> >> >> wrote:
>>>> > >> > >> >> >> >>
>>>> > >> > >> >> >> >> > I am enthusiastic about Substrait and have followed
>>>> it's
>>>> > >> > >> progress
>>>> > >> > >> >> >> eagerly
>>>> > >> > >> >> >> >> > =D
>>>> > >> > >> >> >> >> >
>>>> > >> > >> >> >> >> > When I presented it as a tentative option, there were
>>>> > >> > >> reservations
>>>> > >> > >> >> >> >> because
>>>> > >> > >> >> >> >> > of the project/spec being young and the functionality
>>>> > still
>>>> > >> > >> being
>>>> > >> > >> >> >> >> > fleshed out.
>>>> > >> > >> >> >> >> > I think if I were having this conversation in say,
>>>> 8-16
>>>> > >> > months,
>>>> > >> > >> it
>>>> > >> > >> >> >> would
>>>> > >> > >> >> >> >> > have been an easy choice, no doubt.
>>>> > >> > >> >> >> >> >
>>>> > >> > >> >> >> >> > On a public mailing list (and I can share more details
>>>> > in
>>>> > >> > >> private
>>>> > >> > >> >> if
>>>> > >> > >> >> >> >> you're
>>>> > >> > >> >> >> >> > curious), the gist of it is this:
>>>> > >> > >> >> >> >> >
>>>> > >> > >> >> >> >> > Some well-defined/backed-by-mature tech solution for
>>>> > >> > expressing
>>>> > >> > >> >> data
>>>> > >> > >> >> >> >> > compute operations between services would be a useful
>>>> > thing
>>>> > >> > to
>>>> > >> > >> have
>>>> > >> > >> >> >> >> > (Especially if it's language-agnostic)
>>>> > >> > >> >> >> >> >
>>>> > >> > >> >> >> >> > The goal is for an "implementing service" to have:
>>>> > >> > >> >> >> >> > - An introspectable schema (IE, "describe yourself to
>>>> > me")
>>>> > >> > >> >> >> >> > - A query/operation execution endpoint (IE: "perform
>>>> > this
>>>> > >> > >> operation
>>>> > >> > >> >> >> on
>>>> > >> > >> >> >> >> your
>>>> > >> > >> >> >> >> > data")
>>>> > >> > >> >> >> >> >
>>>> > >> > >> >> >> >> > With FlightSQL this is possible I believe, but it
>>>> > requires
>>>> > >> > the
>>>> > >> > >> >> >> operation
>>>> > >> > >> >> >> >> to
>>>> > >> > >> >> >> >> > be expressed as a SQL string which isn't ideal.
>>>> > >> > >> >> >> >> >
>>>> > >> > >> >> >> >> > Working with some programmatic, structured object that
>>>> > has
>>>> > >> > the
>>>> > >> > >> same
>>>> > >> > >> >> >> >> > semantics ("Logical Plan", or whatnot) as a SQL query
>>>> > would
>>>> > >> > >> have,
>>>> > >> > >> >> >> would
>>>> > >> > >> >> >> >> be
>>>> > >> > >> >> >> >> > a better experience
>>>> > >> > >> >> >> >> > (Jacques is on to something here!)
>>>> > >> > >> >> >> >> >
>>>> > >> > >> >> >> >> > This interface between services would be somewhat the
>>>> > >> > >> equivalent of
>>>> > >> > >> >> >> an
>>>> > >> > >> >> >> >> > "SDK", so it would be nice to have a strongly-typed
>>>> > library
>>>> > >> > for
>>>> > >> > >> >> >> >> expressing
>>>> > >> > >> >> >> >> > and building-up query/data-compute ops.
>>>> > >> > >> >> >> >> >
>>>> > >> > >> >> >> >> >
>>>> > >> > >> >> >> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li <
>>>> > >> lidavidm@apache.org
>>>> > >> > >
>>>> > >> > >> >> wrote:
>>>> > >> > >> >> >> >> >
>>>> > >> > >> >> >> >> > > You probably want Substrait: https://substrait.io/
>>>> > >> > >> >> >> >> > >
>>>> > >> > >> >> >> >> > > Which is being worked on by several people,
>>>> including
>>>> > >> Arrow
>>>> > >> > >> >> >> community
>>>> > >> > >> >> >> >> > > members.
>>>> > >> > >> >> >> >> > >
>>>> > >> > >> >> >> >> > > It might be interesting to generalize Flight SQL to
>>>> > >> include
>>>> > >> > >> >> >> support for
>>>> > >> > >> >> >> >> > > Substrait. I'm curious what your application, if
>>>> > you're
>>>> > >> > able
>>>> > >> > >> to
>>>> > >> > >> >> >> share
>>>> > >> > >> >> >> >> > more.
>>>> > >> > >> >> >> >> > >
>>>> > >> > >> >> >> >> > > -David
>>>> > >> > >> >> >> >> > >
>>>> > >> > >> >> >> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote:
>>>> > >> > >> >> >> >> > > > Hiya,
>>>> > >> > >> >> >> >> > > >
>>>> > >> > >> >> >> >> > > > I am drafting a proposal for a way to enable
>>>> > services
>>>> > >> to
>>>> > >> > >> >> express
>>>> > >> > >> >> >> data
>>>> > >> > >> >> >> >> > > > compute operations to each other.
>>>> > >> > >> >> >> >> > > >
>>>> > >> > >> >> >> >> > > > However I think it'll be difficult to get buy-in
>>>> if
>>>> > the
>>>> > >> > only
>>>> > >> > >> >> >> >> > > representation
>>>> > >> > >> >> >> >> > > > for queries is as SQL strings.
>>>> > >> > >> >> >> >> > > >
>>>> > >> > >> >> >> >> > > > Is there any kind of lower-level API that can be
>>>> > used
>>>> > >> to
>>>> > >> > >> >> express
>>>> > >> > >> >> >> >> > > operations?
>>>> > >> > >> >> >> >> > > >
>>>> > >> > >> >> >> >> > > > IE instead of "SELECT name FROM user"
>>>> > >> > >> >> >> >> > > >
>>>> > >> > >> >> >> >> > > > A structured representation like:
>>>> > >> > >> >> >> >> > > > {
>>>> > >> > >> >> >> >> > > >   "op": "query",
>>>> > >> > >> >> >> >> > > >   "schema": "user",
>>>> > >> > >> >> >> >> > > >   "project": ["name"]
>>>> > >> > >> >> >> >> > > > }
>>>> > >> > >> >> >> >> > > >
>>>> > >> > >> >> >> >> > > > Or maybe this is a bad idea/doesn't make sense?
>>>> > >> > >> >> >> >> > > >
>>>> > >> > >> >> >> >> > > > Thank you =)
>>>> > >> > >> >> >> >> > >
>>>> > >> > >> >> >> >> >
>>>> > >> > >> >> >> >>
>>>> > >> > >> >> >> >>
>>>> > >> > >> >> >> >> --
>>>> > >> > >> >> >> >>
>>>> > >> > >> >> >> >> *James Duong*
>>>> > >> > >> >> >> >> Lead Software Developer
>>>> > >> > >> >> >> >> Bit Quill Technologies Inc.
>>>> > >> > >> >> >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>> > >> > >> >> >> >> https://www.bitquilltech.com
>>>> > >> > >> >> >> >>
>>>> > >> > >> >> >> >> This email message is for the sole use of the intended
>>>> > >> > >> recipient(s)
>>>> > >> > >> >> >> and may
>>>> > >> > >> >> >> >> contain confidential and privileged information.  Any
>>>> > >> > unauthorized
>>>> > >> > >> >> >> review,
>>>> > >> > >> >> >> >> use, disclosure, or distribution is prohibited.  If you
>>>> > are
>>>> > >> not
>>>> > >> > >> the
>>>> > >> > >> >> >> >> intended recipient, please contact the sender by reply
>>>> > email
>>>> > >> > and
>>>> > >> > >> >> >> destroy
>>>> > >> > >> >> >> >> all copies of the original message.  Thank you.
>>>> > >> > >> >> >> >>
>>>> > >> > >> >> >>
>>>> > >> > >> >> >
>>>> > >> > >> >>
>>>> > >> > >> >> --
>>>> > >> > >> >>
>>>> > >> > >> >> *James Duong*
>>>> > >> > >> >> Lead Software Developer
>>>> > >> > >> >> Bit Quill Technologies Inc.
>>>> > >> > >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>> > >> > >> >> https://www.bitquilltech.com
>>>> > >> > >> >>
>>>> > >> > >> >> This email message is for the sole use of the intended
>>>> > recipient(s)
>>>> > >> > and
>>>> > >> > >> may
>>>> > >> > >> >> contain confidential and privileged information.  Any
>>>> > unauthorized
>>>> > >> > >> review,
>>>> > >> > >> >> use, disclosure, or distribution is prohibited.  If you are
>>>> not
>>>> > the
>>>> > >> > >> >> intended recipient, please contact the sender by reply email
>>>> and
>>>> > >> > destroy
>>>> > >> > >> >> all copies of the original message.  Thank you.
>>>> > >> > >> >>
>>>> > >> > >>
>>>> > >> >
>>>> > >>
>>>> >
>>>>
>>>>
>>>> --
>>>>
>>>> *James Duong*
>>>> Lead Software Developer
>>>> Bit Quill Technologies Inc.
>>>> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>>> https://www.bitquilltech.com
>>>>
>>>> This email message is for the sole use of the intended recipient(s) and may
>>>> contain confidential and privileged information.  Any unauthorized review,
>>>> use, disclosure, or distribution is prohibited.  If you are not the
>>>> intended recipient, please contact the sender by reply email and destroy
>>>> all copies of the original message.  Thank you.
>>>>

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

Posted by David Li <li...@apache.org>.
I quickly drafted these out (sans implementation so far): https://github.com/apache/arrow/pull/13492

On Thu, Jun 30, 2022, at 21:20, David Li wrote:
> Ah - somehow I didn't think of that. Yes, we should just implement it 
> in the same way prepared statements are already implemented.
>
> On Thu, Jun 30, 2022, at 19:42, Micah Kornfield wrote:
>>>
>>> It would also then be good to make explicit the statefulness of
>>> connections in Flight SQL. While that is sort of an obvious constraint, it
>>> is at odds with how gRPC is usually used (especially in the presence of
>>> load balancing).
>>
>>
>> I'm not sure I understand where the statefulness requirements come in?
>> Could you elaborate?  It seems that a transaction could be an opaque ID on
>> operations?
>>
>> On Thu, Jun 30, 2022 at 2:47 PM James Duong <ja...@bitquilltech.com.invalid>
>> wrote:
>>
>>> This is a bit of a tangent from the original discussion about
>>> Substrait integration.
>>>
>>> Flight SQL would definitely benefit from transaction RPC commands for
>>> building bridge drivers. I'm also wondering if there should be an RPC call
>>> to cancel a running query, as opposed to just having the client terminate
>>> streams. This would allow a multi-process application to cancel work across
>>> processes.
>>>
>>> On Thu, Jun 30, 2022 at 1:35 PM David Li <li...@apache.org> wrote:
>>>
>>> > Reviving this discussion: would people be interested in seeing a
>>> > sketched-out CommandSubstraitQuery et. al.?
>>> >
>>> > Additionally, while working on ADBC, I realized: does Flight SQL need
>>> > explicit Commit/Rollback commands? This would presumably be necessary if
>>> we
>>> > want to build ODBC/JDBC drivers on top, since those standards have
>>> explicit
>>> > commands, and Flight SQL doesn't have the luxury of a driver to issue
>>> > database-specific SQL to implement these.
>>> >
>>> > It would also then be good to make explicit the statefulness of
>>> > connections in Flight SQL. While that is sort of an obvious constraint,
>>> it
>>> > is at odds with how gRPC is usually used (especially in the presence of
>>> > load balancing).
>>> >
>>> > On Sun, Mar 6, 2022, at 14:44, Gavin Ray wrote:
>>> > > Got it, thank you David!
>>> > > I started prototyping the implementation last night, hopefully I will
>>> > make
>>> > > some good progress and have something basic functioning soon.
>>> > >
>>> > > RE: The metadata thing -- I think both Calcite and Teiid have solid
>>> > > interfaces for defining what capabilities a datasource has.
>>> > >
>>> >
>>> https://github.com/teiid/teiid/blob/8e9057a46be009d68b2d67701781f1f8c175baa7/api/src/main/java/org/teiid/translator/ExecutionFactory.java#L349-L1528
>>> > >
>>> > > It's probably not possible to make something universal, but it seems
>>> like
>>> > > you could get pretty close to most common functionality/capabilities
>>> > >
>>> > >
>>> > > On Sat, Mar 5, 2022 at 11:48 PM Kyle Porter <kylep@bitquilltech.com
>>> > .invalid>
>>> > > wrote:
>>> > >
>>> > >> Yes, we should, where possible, avoid any one of metadata. This is
>>> where
>>> > >> other standards fail in that applications must be custom built for
>>> each
>>> > >> data source, if we standardize the metadata then applications can at
>>> > least
>>> > >> be built to adapt.
>>> > >>
>>> > >> On Sat., Mar. 5, 2022, 6:54 p.m. David Li, <li...@apache.org>
>>> wrote:
>>> > >>
>>> > >> > Yes, GetSqlInfo reserves a range of metadata IDs for Flight SQL's
>>> > use, so
>>> > >> > the application can use others for its own purposes. That said if
>>> they
>>> > >> seem
>>> > >> > commonly applicable maybe we should try to standardize them.
>>> > >> >
>>> > >> > I think what you are doing should be reasonable. You may not need
>>> > _all_
>>> > >> of
>>> > >> > the capabilities in Flight SQL for this (e.g. all the various
>>> metadata
>>> > >> > calls, or prepared statements, perhaps) but I don't see why it
>>> > wouldn't
>>> > >> > work for you.
>>> > >> >
>>> > >> > On Fri, Mar 4, 2022, at 19:03, Gavin Ray wrote:
>>> > >> > > To touch on the question about supported features -- is it
>>> possible
>>> > to
>>> > >> > > advertise arbitrary/custom "capabilites" in GetSqlInfo?
>>> > >> > > Say that you want to represent some set of behaviors that
>>> FlightSQL
>>> > >> > > services can support.
>>> > >> > >
>>> > >> > > Stuff like "Supports grouping by multiple distinct aggregates",
>>> > >> "Supports
>>> > >> > > self-joins on aliased tables" etc
>>> > >> > > This is going to be unique to each implementation, but I couldn't
>>> > >> > determine
>>> > >> > > whether there was a way to express arbitrary capabilities
>>> > >> > >
>>> > >> > > Also, in case it's helpful I put together an ASCII diagram of what
>>> > I'm
>>> > >> > > trying to do with FlightSQL
>>> > >> > > If anyone has a moment, would appreciate input on whether it's
>>> > >> feasible/a
>>> > >> > > good idea
>>> > >> > >
>>> > >> > > https://pastebin.com/raw/VF2r0F3f
>>> > >> > >
>>> > >> > > Thank you =)
>>> > >> > >
>>> > >> > >
>>> > >> > > On Fri, Mar 4, 2022 at 2:37 PM David Li <li...@apache.org>
>>> > wrote:
>>> > >> > >
>>> > >> > >> We could also add say CommandSubstraitQuery as a distinct
>>> message,
>>> > and
>>> > >> > >> older servers would just reject it as an unknown request type.
>>> > >> > >>
>>> > >> > >> -David
>>> > >> > >>
>>> > >> > >> On Fri, Mar 4, 2022, at 17:01, Micah Kornfield wrote:
>>> > >> > >> >>
>>> > >> > >> >> 1. How does a server report that it supports each command
>>> type?
>>> > >> > Initial
>>> > >> > >> >> thought is a property in GetSqlInfo.
>>> > >> > >> >
>>> > >> > >> >
>>> > >> > >> > This sounds reasonable.
>>> > >> > >> >
>>> > >> > >> >
>>> > >> > >> >> What happens to client code written prior to changing the
>>> > command
>>> > >> > type
>>> > >> > >> >> to be a oneOf field? Same for servers.
>>> > >> > >> >
>>> > >> > >> >
>>> > >> > >> > It is transparent from older clients (I'm 99% sure the wire
>>> > protocol
>>> > >> > >> > doesn't change).  Servers is a little harder.  The one saving
>>> > grace
>>> > >> > is I
>>> > >> > >> > don't think an empty/not-present SQL string would be something
>>> > most
>>> > >> > >> servers
>>> > >> > >> > could handle, so they would probably error with something that
>>> > while
>>> > >> > >> > not-obvious would give a clue to the clients (but hopefully
>>> this
>>> > >> would
>>> > >> > >> be a
>>> > >> > >> > non-issue because the capabilities would be checked for clients
>>> > >> > wishing
>>> > >> > >> to
>>> > >> > >> > to use this feature first).
>>> > >> > >> >
>>> > >> > >> > -Micah
>>> > >> > >> >
>>> > >> > >> > On Fri, Mar 4, 2022 at 1:50 PM James Duong <
>>> > jamesd@bitquilltech.com
>>> > >> > >> .invalid>
>>> > >> > >> > wrote:
>>> > >> > >> >
>>> > >> > >> >> It sounds like an interesting and useful project to use
>>> > Subtstrait
>>> > >> > as an
>>> > >> > >> >> alternative to SQL strings.
>>> > >> > >> >>
>>> > >> > >> >> Important aspects to spec out are:
>>> > >> > >> >> 1. How does a server report that it supports each command
>>> type?
>>> > >> > Initial
>>> > >> > >> >> thought is a property in GetSqlInfo.
>>> > >> > >> >> 2. What happens to client code written prior to changing the
>>> > >> command
>>> > >> > >> type
>>> > >> > >> >> to be a oneOf field? Same for servers.
>>> > >> > >> >> More generally, how should backward compatibility work, and
>>> what
>>> > >> > should
>>> > >> > >> >> happen if a client sends an unsupported
>>> > >> > >> >> command type to a server.
>>> > >> > >> >> 3. Should inputs to catalog RPC calls also accept Substrait
>>> > >> > structures?
>>> > >> > >> >>
>>> > >> > >> >> On Thu, Mar 3, 2022 at 11:00 PM Gavin Ray <
>>> > ray.gavin97@gmail.com>
>>> > >> > >> wrote:
>>> > >> > >> >>
>>> > >> > >> >> > @James Duong <ja...@bitquilltech.com>
>>> > >> > >> >> >
>>> > >> > >> >> > You are absolutely right, I realized this and confirmed
>>> > whether
>>> > >> > this
>>> > >> > >> >> > would be possible with Jacques to double-check.
>>> > >> > >> >> > It would amount to what I might call "dollar-store
>>> Substrait."
>>> > >> It's
>>> > >> > >> not
>>> > >> > >> >> > elegant or a good solution, but definitely presents a good
>>> > >> > duct-tape
>>> > >> > >> hack
>>> > >> > >> >> > and is a crafty idea.
>>> > >> > >> >> >
>>> > >> > >> >> > I agree with Jacques -- when you think about FlightSQL, what
>>> > you
>>> > >> > are
>>> > >> > >> >> > attempting with a query isn't necessarily SQL, but a general
>>> > >> > >> data-compute
>>> > >> > >> >> > operation.
>>> > >> > >> >> > SQL just so happens to be a fairly universal way to express
>>> > them,
>>> > >> > >> with an
>>> > >> > >> >> > ANSI standard, but FlightSQL doesn't recognize any
>>> particular
>>> > >> > subset
>>> > >> > >> of
>>> > >> > >> >> it
>>> > >> > >> >> > and for all intents and purposes it doesn't matter what the
>>> > >> > operation
>>> > >> > >> >> > string contains.
>>> > >> > >> >> >
>>> > >> > >> >> > Substrait would make a fantastic logical next-feature
>>> because
>>> > >> it's
>>> > >> > >> >> > targeted as a specification for expressing relational
>>> algebra
>>> > and
>>> > >> > >> >> > data-compute operations
>>> > >> > >> >> > This more-or-less equates to SQL strings (in my mind at
>>> least)
>>> > >> > with a
>>> > >> > >> >> much
>>> > >> > >> >> > better toolkit and Dev UX. If there is anything I can do to
>>> > help
>>> > >> > move
>>> > >> > >> >> this
>>> > >> > >> >> > forward, please let me know because I am extremely motivated
>>> > to
>>> > >> do
>>> > >> > so.
>>> > >> > >> >> >
>>> > >> > >> >> > @David Li <gi...@lidavidm.me>
>>> > >> > >> >> >
>>> > >> > >> >> > Also agreed. Substrait is put together by folks much smarter
>>> > than
>>> > >> > >> myself,
>>> > >> > >> >> > and if I had to hedge my bets, I'd put money on it being the
>>> > >> > future of
>>> > >> > >> >> > data-compute interop.
>>> > >> > >> >> > I would love nothing more than to adopt this technology and
>>> > push
>>> > >> it
>>> > >> > >> >> along.
>>> > >> > >> >> >
>>> > >> > >> >> > Your project does sound interesting - basically, it sounds
>>> > like a
>>> > >> > >> tabular
>>> > >> > >> >> >> data storage service with query pushdown?
>>> > >> > >> >> >>
>>> > >> > >> >> >
>>> > >> > >> >> > Yeah this is more or less the details of it (my personal
>>> > email,
>>> > >> > with
>>> > >> > >> >> > discretion assumed, is always open)
>>> > >> > >> >> >
>>> > >> > >> >> > Imagine an environment where a backend wants to advertise
>>> some
>>> > >> > kind of
>>> > >> > >> >> > schema/data catalog
>>> > >> > >> >> >
>>> > >> > >> >> > And then a central service introspects these backends, and
>>> > >> > dynamically
>>> > >> > >> >> > generates an API from the data catalogues/schemas, where
>>> > requests
>>> > >> > get
>>> > >> > >> >> > proxied to the underlying backend service for each schema to
>>> > >> > actually
>>> > >> > >> be
>>> > >> > >> >> > executed
>>> > >> > >> >> >
>>> > >> > >> >> > In text, the flow would look something like:
>>> > >> > >> >> >
>>> > >> > >> >> >
>>> > >> > >> >> >        <----> Data Provider Backend 0
>>> > >> > >> >> > Client <-----> Central Service <---> Generated API <---->
>>> > >> > >> Data-Provider
>>> > >> > >> >> > Backend 1
>>> > >> > >> >> >
>>> > >> > >> >> >        <----> Data Provider Backend 2
>>> > >> > >> >> >
>>> > >> > >> >> >
>>> > >> > >> >> >
>>> > >> > >> >> > On Thu, Mar 3, 2022 at 5:52 PM David Li <
>>> lidavidm@apache.org>
>>> > >> > wrote:
>>> > >> > >> >> >
>>> > >> > >> >> >> Gavin, thanks for sharing. I'm not so sure you'll find an
>>> > >> > >> alternative to
>>> > >> > >> >> >> Substrait, at least one that isn't even more nascent or one
>>> > >> that's
>>> > >> > >> very
>>> > >> > >> >> >> tied to a particular language, so perhaps it might be
>>> better
>>> > to
>>> > >> > get
>>> > >> > >> >> >> involved in Substrait and see if it suits your needs?
>>> > >> Convincing a
>>> > >> > >> team
>>> > >> > >> >> to
>>> > >> > >> >> >> try something new can be hard, though, and it is somewhat
>>> of
>>> > a
>>> > >> > moving
>>> > >> > >> >> >> target - but Flight SQL is in a similar spot, I think, as
>>> > it's
>>> > >> > still
>>> > >> > >> >> >> getting enhancements.
>>> > >> > >> >> >>
>>> > >> > >> >> >> Your project does sound interesting - basically, it sounds
>>> > like
>>> > >> a
>>> > >> > >> >> tabular
>>> > >> > >> >> >> data storage service with query pushdown?
>>> > >> > >> >> >>
>>> > >> > >> >> >> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote:
>>> > >> > >> >> >> > James, I agree that you could use JSON but that feels a
>>> bit
>>> > >> > hacky
>>> > >> > >> >> >> > (mis-use
>>> > >> > >> >> >> > of the paradigm). Instead, I'd really like to do
>>> something
>>> > >> like
>>> > >> > >> David
>>> > >> > >> >> is
>>> > >> > >> >> >> > suggesting: support Substrait as an alternative to a SQL
>>> > >> string.
>>> > >> > >> >> >> > Something like this:
>>> > >> > >> >> >> >
>>> > >> > >> >> >>
>>> > >> > >> >>
>>> > >> > >>
>>> > >> >
>>> > >>
>>> >
>>> https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc
>>> > >> > >> >> >> >
>>> > >> > >> >> >> > It would be great if someone wanted to pick this up. It
>>> > would
>>> > >> > be a
>>> > >> > >> >> nice
>>> > >> > >> >> >> > enhancement to FlightSQL (and provide a structured way to
>>> > >> > express
>>> > >> > >> >> >> > operations).
>>> > >> > >> >> >> >
>>> > >> > >> >> >> >
>>> > >> > >> >> >> >
>>> > >> > >> >> >> > On Thu, Mar 3, 2022 at 4:56 PM James Duong <
>>> > >> > >> jamesd@bitquilltech.com
>>> > >> > >> >> >> .invalid>
>>> > >> > >> >> >> > wrote:
>>> > >> > >> >> >> >
>>> > >> > >> >> >> >> In the same way that you could write an ODBC driver that
>>> > >> takes
>>> > >> > in
>>> > >> > >> >> text
>>> > >> > >> >> >> >> that's not SQL, you could write a Flight SQL server that
>>> > >> takes
>>> > >> > in
>>> > >> > >> >> text
>>> > >> > >> >> >> >> that's JSON.
>>> > >> > >> >> >> >> Flight SQL doesn't parse the query, so you could create
>>> > >> > commands
>>> > >> > >> that
>>> > >> > >> >> >> are
>>> > >> > >> >> >> >> just JSON text.
>>> > >> > >> >> >> >>
>>> > >> > >> >> >> >> Is that the only bit you need, Gavin?
>>> > >> > >> >> >> >>
>>> > >> > >> >> >> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray <
>>> > >> > ray.gavin97@gmail.com>
>>> > >> > >> >> >> wrote:
>>> > >> > >> >> >> >>
>>> > >> > >> >> >> >> > I am enthusiastic about Substrait and have followed
>>> it's
>>> > >> > >> progress
>>> > >> > >> >> >> eagerly
>>> > >> > >> >> >> >> > =D
>>> > >> > >> >> >> >> >
>>> > >> > >> >> >> >> > When I presented it as a tentative option, there were
>>> > >> > >> reservations
>>> > >> > >> >> >> >> because
>>> > >> > >> >> >> >> > of the project/spec being young and the functionality
>>> > still
>>> > >> > >> being
>>> > >> > >> >> >> >> > fleshed out.
>>> > >> > >> >> >> >> > I think if I were having this conversation in say,
>>> 8-16
>>> > >> > months,
>>> > >> > >> it
>>> > >> > >> >> >> would
>>> > >> > >> >> >> >> > have been an easy choice, no doubt.
>>> > >> > >> >> >> >> >
>>> > >> > >> >> >> >> > On a public mailing list (and I can share more details
>>> > in
>>> > >> > >> private
>>> > >> > >> >> if
>>> > >> > >> >> >> >> you're
>>> > >> > >> >> >> >> > curious), the gist of it is this:
>>> > >> > >> >> >> >> >
>>> > >> > >> >> >> >> > Some well-defined/backed-by-mature tech solution for
>>> > >> > expressing
>>> > >> > >> >> data
>>> > >> > >> >> >> >> > compute operations between services would be a useful
>>> > thing
>>> > >> > to
>>> > >> > >> have
>>> > >> > >> >> >> >> > (Especially if it's language-agnostic)
>>> > >> > >> >> >> >> >
>>> > >> > >> >> >> >> > The goal is for an "implementing service" to have:
>>> > >> > >> >> >> >> > - An introspectable schema (IE, "describe yourself to
>>> > me")
>>> > >> > >> >> >> >> > - A query/operation execution endpoint (IE: "perform
>>> > this
>>> > >> > >> operation
>>> > >> > >> >> >> on
>>> > >> > >> >> >> >> your
>>> > >> > >> >> >> >> > data")
>>> > >> > >> >> >> >> >
>>> > >> > >> >> >> >> > With FlightSQL this is possible I believe, but it
>>> > requires
>>> > >> > the
>>> > >> > >> >> >> operation
>>> > >> > >> >> >> >> to
>>> > >> > >> >> >> >> > be expressed as a SQL string which isn't ideal.
>>> > >> > >> >> >> >> >
>>> > >> > >> >> >> >> > Working with some programmatic, structured object that
>>> > has
>>> > >> > the
>>> > >> > >> same
>>> > >> > >> >> >> >> > semantics ("Logical Plan", or whatnot) as a SQL query
>>> > would
>>> > >> > >> have,
>>> > >> > >> >> >> would
>>> > >> > >> >> >> >> be
>>> > >> > >> >> >> >> > a better experience
>>> > >> > >> >> >> >> > (Jacques is on to something here!)
>>> > >> > >> >> >> >> >
>>> > >> > >> >> >> >> > This interface between services would be somewhat the
>>> > >> > >> equivalent of
>>> > >> > >> >> >> an
>>> > >> > >> >> >> >> > "SDK", so it would be nice to have a strongly-typed
>>> > library
>>> > >> > for
>>> > >> > >> >> >> >> expressing
>>> > >> > >> >> >> >> > and building-up query/data-compute ops.
>>> > >> > >> >> >> >> >
>>> > >> > >> >> >> >> >
>>> > >> > >> >> >> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li <
>>> > >> lidavidm@apache.org
>>> > >> > >
>>> > >> > >> >> wrote:
>>> > >> > >> >> >> >> >
>>> > >> > >> >> >> >> > > You probably want Substrait: https://substrait.io/
>>> > >> > >> >> >> >> > >
>>> > >> > >> >> >> >> > > Which is being worked on by several people,
>>> including
>>> > >> Arrow
>>> > >> > >> >> >> community
>>> > >> > >> >> >> >> > > members.
>>> > >> > >> >> >> >> > >
>>> > >> > >> >> >> >> > > It might be interesting to generalize Flight SQL to
>>> > >> include
>>> > >> > >> >> >> support for
>>> > >> > >> >> >> >> > > Substrait. I'm curious what your application, if
>>> > you're
>>> > >> > able
>>> > >> > >> to
>>> > >> > >> >> >> share
>>> > >> > >> >> >> >> > more.
>>> > >> > >> >> >> >> > >
>>> > >> > >> >> >> >> > > -David
>>> > >> > >> >> >> >> > >
>>> > >> > >> >> >> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote:
>>> > >> > >> >> >> >> > > > Hiya,
>>> > >> > >> >> >> >> > > >
>>> > >> > >> >> >> >> > > > I am drafting a proposal for a way to enable
>>> > services
>>> > >> to
>>> > >> > >> >> express
>>> > >> > >> >> >> data
>>> > >> > >> >> >> >> > > > compute operations to each other.
>>> > >> > >> >> >> >> > > >
>>> > >> > >> >> >> >> > > > However I think it'll be difficult to get buy-in
>>> if
>>> > the
>>> > >> > only
>>> > >> > >> >> >> >> > > representation
>>> > >> > >> >> >> >> > > > for queries is as SQL strings.
>>> > >> > >> >> >> >> > > >
>>> > >> > >> >> >> >> > > > Is there any kind of lower-level API that can be
>>> > used
>>> > >> to
>>> > >> > >> >> express
>>> > >> > >> >> >> >> > > operations?
>>> > >> > >> >> >> >> > > >
>>> > >> > >> >> >> >> > > > IE instead of "SELECT name FROM user"
>>> > >> > >> >> >> >> > > >
>>> > >> > >> >> >> >> > > > A structured representation like:
>>> > >> > >> >> >> >> > > > {
>>> > >> > >> >> >> >> > > >   "op": "query",
>>> > >> > >> >> >> >> > > >   "schema": "user",
>>> > >> > >> >> >> >> > > >   "project": ["name"]
>>> > >> > >> >> >> >> > > > }
>>> > >> > >> >> >> >> > > >
>>> > >> > >> >> >> >> > > > Or maybe this is a bad idea/doesn't make sense?
>>> > >> > >> >> >> >> > > >
>>> > >> > >> >> >> >> > > > Thank you =)
>>> > >> > >> >> >> >> > >
>>> > >> > >> >> >> >> >
>>> > >> > >> >> >> >>
>>> > >> > >> >> >> >>
>>> > >> > >> >> >> >> --
>>> > >> > >> >> >> >>
>>> > >> > >> >> >> >> *James Duong*
>>> > >> > >> >> >> >> Lead Software Developer
>>> > >> > >> >> >> >> Bit Quill Technologies Inc.
>>> > >> > >> >> >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>> > >> > >> >> >> >> https://www.bitquilltech.com
>>> > >> > >> >> >> >>
>>> > >> > >> >> >> >> This email message is for the sole use of the intended
>>> > >> > >> recipient(s)
>>> > >> > >> >> >> and may
>>> > >> > >> >> >> >> contain confidential and privileged information.  Any
>>> > >> > unauthorized
>>> > >> > >> >> >> review,
>>> > >> > >> >> >> >> use, disclosure, or distribution is prohibited.  If you
>>> > are
>>> > >> not
>>> > >> > >> the
>>> > >> > >> >> >> >> intended recipient, please contact the sender by reply
>>> > email
>>> > >> > and
>>> > >> > >> >> >> destroy
>>> > >> > >> >> >> >> all copies of the original message.  Thank you.
>>> > >> > >> >> >> >>
>>> > >> > >> >> >>
>>> > >> > >> >> >
>>> > >> > >> >>
>>> > >> > >> >> --
>>> > >> > >> >>
>>> > >> > >> >> *James Duong*
>>> > >> > >> >> Lead Software Developer
>>> > >> > >> >> Bit Quill Technologies Inc.
>>> > >> > >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>> > >> > >> >> https://www.bitquilltech.com
>>> > >> > >> >>
>>> > >> > >> >> This email message is for the sole use of the intended
>>> > recipient(s)
>>> > >> > and
>>> > >> > >> may
>>> > >> > >> >> contain confidential and privileged information.  Any
>>> > unauthorized
>>> > >> > >> review,
>>> > >> > >> >> use, disclosure, or distribution is prohibited.  If you are
>>> not
>>> > the
>>> > >> > >> >> intended recipient, please contact the sender by reply email
>>> and
>>> > >> > destroy
>>> > >> > >> >> all copies of the original message.  Thank you.
>>> > >> > >> >>
>>> > >> > >>
>>> > >> >
>>> > >>
>>> >
>>>
>>>
>>> --
>>>
>>> *James Duong*
>>> Lead Software Developer
>>> Bit Quill Technologies Inc.
>>> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>>> https://www.bitquilltech.com
>>>
>>> This email message is for the sole use of the intended recipient(s) and may
>>> contain confidential and privileged information.  Any unauthorized review,
>>> use, disclosure, or distribution is prohibited.  If you are not the
>>> intended recipient, please contact the sender by reply email and destroy
>>> all copies of the original message.  Thank you.
>>>

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

Posted by David Li <li...@apache.org>.
Ah - somehow I didn't think of that. Yes, we should just implement it in the same way prepared statements are already implemented.

On Thu, Jun 30, 2022, at 19:42, Micah Kornfield wrote:
>>
>> It would also then be good to make explicit the statefulness of
>> connections in Flight SQL. While that is sort of an obvious constraint, it
>> is at odds with how gRPC is usually used (especially in the presence of
>> load balancing).
>
>
> I'm not sure I understand where the statefulness requirements come in?
> Could you elaborate?  It seems that a transaction could be an opaque ID on
> operations?
>
> On Thu, Jun 30, 2022 at 2:47 PM James Duong <ja...@bitquilltech.com.invalid>
> wrote:
>
>> This is a bit of a tangent from the original discussion about
>> Substrait integration.
>>
>> Flight SQL would definitely benefit from transaction RPC commands for
>> building bridge drivers. I'm also wondering if there should be an RPC call
>> to cancel a running query, as opposed to just having the client terminate
>> streams. This would allow a multi-process application to cancel work across
>> processes.
>>
>> On Thu, Jun 30, 2022 at 1:35 PM David Li <li...@apache.org> wrote:
>>
>> > Reviving this discussion: would people be interested in seeing a
>> > sketched-out CommandSubstraitQuery et. al.?
>> >
>> > Additionally, while working on ADBC, I realized: does Flight SQL need
>> > explicit Commit/Rollback commands? This would presumably be necessary if
>> we
>> > want to build ODBC/JDBC drivers on top, since those standards have
>> explicit
>> > commands, and Flight SQL doesn't have the luxury of a driver to issue
>> > database-specific SQL to implement these.
>> >
>> > It would also then be good to make explicit the statefulness of
>> > connections in Flight SQL. While that is sort of an obvious constraint,
>> it
>> > is at odds with how gRPC is usually used (especially in the presence of
>> > load balancing).
>> >
>> > On Sun, Mar 6, 2022, at 14:44, Gavin Ray wrote:
>> > > Got it, thank you David!
>> > > I started prototyping the implementation last night, hopefully I will
>> > make
>> > > some good progress and have something basic functioning soon.
>> > >
>> > > RE: The metadata thing -- I think both Calcite and Teiid have solid
>> > > interfaces for defining what capabilities a datasource has.
>> > >
>> >
>> https://github.com/teiid/teiid/blob/8e9057a46be009d68b2d67701781f1f8c175baa7/api/src/main/java/org/teiid/translator/ExecutionFactory.java#L349-L1528
>> > >
>> > > It's probably not possible to make something universal, but it seems
>> like
>> > > you could get pretty close to most common functionality/capabilities
>> > >
>> > >
>> > > On Sat, Mar 5, 2022 at 11:48 PM Kyle Porter <kylep@bitquilltech.com
>> > .invalid>
>> > > wrote:
>> > >
>> > >> Yes, we should, where possible, avoid any one of metadata. This is
>> where
>> > >> other standards fail in that applications must be custom built for
>> each
>> > >> data source, if we standardize the metadata then applications can at
>> > least
>> > >> be built to adapt.
>> > >>
>> > >> On Sat., Mar. 5, 2022, 6:54 p.m. David Li, <li...@apache.org>
>> wrote:
>> > >>
>> > >> > Yes, GetSqlInfo reserves a range of metadata IDs for Flight SQL's
>> > use, so
>> > >> > the application can use others for its own purposes. That said if
>> they
>> > >> seem
>> > >> > commonly applicable maybe we should try to standardize them.
>> > >> >
>> > >> > I think what you are doing should be reasonable. You may not need
>> > _all_
>> > >> of
>> > >> > the capabilities in Flight SQL for this (e.g. all the various
>> metadata
>> > >> > calls, or prepared statements, perhaps) but I don't see why it
>> > wouldn't
>> > >> > work for you.
>> > >> >
>> > >> > On Fri, Mar 4, 2022, at 19:03, Gavin Ray wrote:
>> > >> > > To touch on the question about supported features -- is it
>> possible
>> > to
>> > >> > > advertise arbitrary/custom "capabilites" in GetSqlInfo?
>> > >> > > Say that you want to represent some set of behaviors that
>> FlightSQL
>> > >> > > services can support.
>> > >> > >
>> > >> > > Stuff like "Supports grouping by multiple distinct aggregates",
>> > >> "Supports
>> > >> > > self-joins on aliased tables" etc
>> > >> > > This is going to be unique to each implementation, but I couldn't
>> > >> > determine
>> > >> > > whether there was a way to express arbitrary capabilities
>> > >> > >
>> > >> > > Also, in case it's helpful I put together an ASCII diagram of what
>> > I'm
>> > >> > > trying to do with FlightSQL
>> > >> > > If anyone has a moment, would appreciate input on whether it's
>> > >> feasible/a
>> > >> > > good idea
>> > >> > >
>> > >> > > https://pastebin.com/raw/VF2r0F3f
>> > >> > >
>> > >> > > Thank you =)
>> > >> > >
>> > >> > >
>> > >> > > On Fri, Mar 4, 2022 at 2:37 PM David Li <li...@apache.org>
>> > wrote:
>> > >> > >
>> > >> > >> We could also add say CommandSubstraitQuery as a distinct
>> message,
>> > and
>> > >> > >> older servers would just reject it as an unknown request type.
>> > >> > >>
>> > >> > >> -David
>> > >> > >>
>> > >> > >> On Fri, Mar 4, 2022, at 17:01, Micah Kornfield wrote:
>> > >> > >> >>
>> > >> > >> >> 1. How does a server report that it supports each command
>> type?
>> > >> > Initial
>> > >> > >> >> thought is a property in GetSqlInfo.
>> > >> > >> >
>> > >> > >> >
>> > >> > >> > This sounds reasonable.
>> > >> > >> >
>> > >> > >> >
>> > >> > >> >> What happens to client code written prior to changing the
>> > command
>> > >> > type
>> > >> > >> >> to be a oneOf field? Same for servers.
>> > >> > >> >
>> > >> > >> >
>> > >> > >> > It is transparent from older clients (I'm 99% sure the wire
>> > protocol
>> > >> > >> > doesn't change).  Servers is a little harder.  The one saving
>> > grace
>> > >> > is I
>> > >> > >> > don't think an empty/not-present SQL string would be something
>> > most
>> > >> > >> servers
>> > >> > >> > could handle, so they would probably error with something that
>> > while
>> > >> > >> > not-obvious would give a clue to the clients (but hopefully
>> this
>> > >> would
>> > >> > >> be a
>> > >> > >> > non-issue because the capabilities would be checked for clients
>> > >> > wishing
>> > >> > >> to
>> > >> > >> > to use this feature first).
>> > >> > >> >
>> > >> > >> > -Micah
>> > >> > >> >
>> > >> > >> > On Fri, Mar 4, 2022 at 1:50 PM James Duong <
>> > jamesd@bitquilltech.com
>> > >> > >> .invalid>
>> > >> > >> > wrote:
>> > >> > >> >
>> > >> > >> >> It sounds like an interesting and useful project to use
>> > Subtstrait
>> > >> > as an
>> > >> > >> >> alternative to SQL strings.
>> > >> > >> >>
>> > >> > >> >> Important aspects to spec out are:
>> > >> > >> >> 1. How does a server report that it supports each command
>> type?
>> > >> > Initial
>> > >> > >> >> thought is a property in GetSqlInfo.
>> > >> > >> >> 2. What happens to client code written prior to changing the
>> > >> command
>> > >> > >> type
>> > >> > >> >> to be a oneOf field? Same for servers.
>> > >> > >> >> More generally, how should backward compatibility work, and
>> what
>> > >> > should
>> > >> > >> >> happen if a client sends an unsupported
>> > >> > >> >> command type to a server.
>> > >> > >> >> 3. Should inputs to catalog RPC calls also accept Substrait
>> > >> > structures?
>> > >> > >> >>
>> > >> > >> >> On Thu, Mar 3, 2022 at 11:00 PM Gavin Ray <
>> > ray.gavin97@gmail.com>
>> > >> > >> wrote:
>> > >> > >> >>
>> > >> > >> >> > @James Duong <ja...@bitquilltech.com>
>> > >> > >> >> >
>> > >> > >> >> > You are absolutely right, I realized this and confirmed
>> > whether
>> > >> > this
>> > >> > >> >> > would be possible with Jacques to double-check.
>> > >> > >> >> > It would amount to what I might call "dollar-store
>> Substrait."
>> > >> It's
>> > >> > >> not
>> > >> > >> >> > elegant or a good solution, but definitely presents a good
>> > >> > duct-tape
>> > >> > >> hack
>> > >> > >> >> > and is a crafty idea.
>> > >> > >> >> >
>> > >> > >> >> > I agree with Jacques -- when you think about FlightSQL, what
>> > you
>> > >> > are
>> > >> > >> >> > attempting with a query isn't necessarily SQL, but a general
>> > >> > >> data-compute
>> > >> > >> >> > operation.
>> > >> > >> >> > SQL just so happens to be a fairly universal way to express
>> > them,
>> > >> > >> with an
>> > >> > >> >> > ANSI standard, but FlightSQL doesn't recognize any
>> particular
>> > >> > subset
>> > >> > >> of
>> > >> > >> >> it
>> > >> > >> >> > and for all intents and purposes it doesn't matter what the
>> > >> > operation
>> > >> > >> >> > string contains.
>> > >> > >> >> >
>> > >> > >> >> > Substrait would make a fantastic logical next-feature
>> because
>> > >> it's
>> > >> > >> >> > targeted as a specification for expressing relational
>> algebra
>> > and
>> > >> > >> >> > data-compute operations
>> > >> > >> >> > This more-or-less equates to SQL strings (in my mind at
>> least)
>> > >> > with a
>> > >> > >> >> much
>> > >> > >> >> > better toolkit and Dev UX. If there is anything I can do to
>> > help
>> > >> > move
>> > >> > >> >> this
>> > >> > >> >> > forward, please let me know because I am extremely motivated
>> > to
>> > >> do
>> > >> > so.
>> > >> > >> >> >
>> > >> > >> >> > @David Li <gi...@lidavidm.me>
>> > >> > >> >> >
>> > >> > >> >> > Also agreed. Substrait is put together by folks much smarter
>> > than
>> > >> > >> myself,
>> > >> > >> >> > and if I had to hedge my bets, I'd put money on it being the
>> > >> > future of
>> > >> > >> >> > data-compute interop.
>> > >> > >> >> > I would love nothing more than to adopt this technology and
>> > push
>> > >> it
>> > >> > >> >> along.
>> > >> > >> >> >
>> > >> > >> >> > Your project does sound interesting - basically, it sounds
>> > like a
>> > >> > >> tabular
>> > >> > >> >> >> data storage service with query pushdown?
>> > >> > >> >> >>
>> > >> > >> >> >
>> > >> > >> >> > Yeah this is more or less the details of it (my personal
>> > email,
>> > >> > with
>> > >> > >> >> > discretion assumed, is always open)
>> > >> > >> >> >
>> > >> > >> >> > Imagine an environment where a backend wants to advertise
>> some
>> > >> > kind of
>> > >> > >> >> > schema/data catalog
>> > >> > >> >> >
>> > >> > >> >> > And then a central service introspects these backends, and
>> > >> > dynamically
>> > >> > >> >> > generates an API from the data catalogues/schemas, where
>> > requests
>> > >> > get
>> > >> > >> >> > proxied to the underlying backend service for each schema to
>> > >> > actually
>> > >> > >> be
>> > >> > >> >> > executed
>> > >> > >> >> >
>> > >> > >> >> > In text, the flow would look something like:
>> > >> > >> >> >
>> > >> > >> >> >
>> > >> > >> >> >        <----> Data Provider Backend 0
>> > >> > >> >> > Client <-----> Central Service <---> Generated API <---->
>> > >> > >> Data-Provider
>> > >> > >> >> > Backend 1
>> > >> > >> >> >
>> > >> > >> >> >        <----> Data Provider Backend 2
>> > >> > >> >> >
>> > >> > >> >> >
>> > >> > >> >> >
>> > >> > >> >> > On Thu, Mar 3, 2022 at 5:52 PM David Li <
>> lidavidm@apache.org>
>> > >> > wrote:
>> > >> > >> >> >
>> > >> > >> >> >> Gavin, thanks for sharing. I'm not so sure you'll find an
>> > >> > >> alternative to
>> > >> > >> >> >> Substrait, at least one that isn't even more nascent or one
>> > >> that's
>> > >> > >> very
>> > >> > >> >> >> tied to a particular language, so perhaps it might be
>> better
>> > to
>> > >> > get
>> > >> > >> >> >> involved in Substrait and see if it suits your needs?
>> > >> Convincing a
>> > >> > >> team
>> > >> > >> >> to
>> > >> > >> >> >> try something new can be hard, though, and it is somewhat
>> of
>> > a
>> > >> > moving
>> > >> > >> >> >> target - but Flight SQL is in a similar spot, I think, as
>> > it's
>> > >> > still
>> > >> > >> >> >> getting enhancements.
>> > >> > >> >> >>
>> > >> > >> >> >> Your project does sound interesting - basically, it sounds
>> > like
>> > >> a
>> > >> > >> >> tabular
>> > >> > >> >> >> data storage service with query pushdown?
>> > >> > >> >> >>
>> > >> > >> >> >> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote:
>> > >> > >> >> >> > James, I agree that you could use JSON but that feels a
>> bit
>> > >> > hacky
>> > >> > >> >> >> > (mis-use
>> > >> > >> >> >> > of the paradigm). Instead, I'd really like to do
>> something
>> > >> like
>> > >> > >> David
>> > >> > >> >> is
>> > >> > >> >> >> > suggesting: support Substrait as an alternative to a SQL
>> > >> string.
>> > >> > >> >> >> > Something like this:
>> > >> > >> >> >> >
>> > >> > >> >> >>
>> > >> > >> >>
>> > >> > >>
>> > >> >
>> > >>
>> >
>> https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc
>> > >> > >> >> >> >
>> > >> > >> >> >> > It would be great if someone wanted to pick this up. It
>> > would
>> > >> > be a
>> > >> > >> >> nice
>> > >> > >> >> >> > enhancement to FlightSQL (and provide a structured way to
>> > >> > express
>> > >> > >> >> >> > operations).
>> > >> > >> >> >> >
>> > >> > >> >> >> >
>> > >> > >> >> >> >
>> > >> > >> >> >> > On Thu, Mar 3, 2022 at 4:56 PM James Duong <
>> > >> > >> jamesd@bitquilltech.com
>> > >> > >> >> >> .invalid>
>> > >> > >> >> >> > wrote:
>> > >> > >> >> >> >
>> > >> > >> >> >> >> In the same way that you could write an ODBC driver that
>> > >> takes
>> > >> > in
>> > >> > >> >> text
>> > >> > >> >> >> >> that's not SQL, you could write a Flight SQL server that
>> > >> takes
>> > >> > in
>> > >> > >> >> text
>> > >> > >> >> >> >> that's JSON.
>> > >> > >> >> >> >> Flight SQL doesn't parse the query, so you could create
>> > >> > commands
>> > >> > >> that
>> > >> > >> >> >> are
>> > >> > >> >> >> >> just JSON text.
>> > >> > >> >> >> >>
>> > >> > >> >> >> >> Is that the only bit you need, Gavin?
>> > >> > >> >> >> >>
>> > >> > >> >> >> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray <
>> > >> > ray.gavin97@gmail.com>
>> > >> > >> >> >> wrote:
>> > >> > >> >> >> >>
>> > >> > >> >> >> >> > I am enthusiastic about Substrait and have followed
>> it's
>> > >> > >> progress
>> > >> > >> >> >> eagerly
>> > >> > >> >> >> >> > =D
>> > >> > >> >> >> >> >
>> > >> > >> >> >> >> > When I presented it as a tentative option, there were
>> > >> > >> reservations
>> > >> > >> >> >> >> because
>> > >> > >> >> >> >> > of the project/spec being young and the functionality
>> > still
>> > >> > >> being
>> > >> > >> >> >> >> > fleshed out.
>> > >> > >> >> >> >> > I think if I were having this conversation in say,
>> 8-16
>> > >> > months,
>> > >> > >> it
>> > >> > >> >> >> would
>> > >> > >> >> >> >> > have been an easy choice, no doubt.
>> > >> > >> >> >> >> >
>> > >> > >> >> >> >> > On a public mailing list (and I can share more details
>> > in
>> > >> > >> private
>> > >> > >> >> if
>> > >> > >> >> >> >> you're
>> > >> > >> >> >> >> > curious), the gist of it is this:
>> > >> > >> >> >> >> >
>> > >> > >> >> >> >> > Some well-defined/backed-by-mature tech solution for
>> > >> > expressing
>> > >> > >> >> data
>> > >> > >> >> >> >> > compute operations between services would be a useful
>> > thing
>> > >> > to
>> > >> > >> have
>> > >> > >> >> >> >> > (Especially if it's language-agnostic)
>> > >> > >> >> >> >> >
>> > >> > >> >> >> >> > The goal is for an "implementing service" to have:
>> > >> > >> >> >> >> > - An introspectable schema (IE, "describe yourself to
>> > me")
>> > >> > >> >> >> >> > - A query/operation execution endpoint (IE: "perform
>> > this
>> > >> > >> operation
>> > >> > >> >> >> on
>> > >> > >> >> >> >> your
>> > >> > >> >> >> >> > data")
>> > >> > >> >> >> >> >
>> > >> > >> >> >> >> > With FlightSQL this is possible I believe, but it
>> > requires
>> > >> > the
>> > >> > >> >> >> operation
>> > >> > >> >> >> >> to
>> > >> > >> >> >> >> > be expressed as a SQL string which isn't ideal.
>> > >> > >> >> >> >> >
>> > >> > >> >> >> >> > Working with some programmatic, structured object that
>> > has
>> > >> > the
>> > >> > >> same
>> > >> > >> >> >> >> > semantics ("Logical Plan", or whatnot) as a SQL query
>> > would
>> > >> > >> have,
>> > >> > >> >> >> would
>> > >> > >> >> >> >> be
>> > >> > >> >> >> >> > a better experience
>> > >> > >> >> >> >> > (Jacques is on to something here!)
>> > >> > >> >> >> >> >
>> > >> > >> >> >> >> > This interface between services would be somewhat the
>> > >> > >> equivalent of
>> > >> > >> >> >> an
>> > >> > >> >> >> >> > "SDK", so it would be nice to have a strongly-typed
>> > library
>> > >> > for
>> > >> > >> >> >> >> expressing
>> > >> > >> >> >> >> > and building-up query/data-compute ops.
>> > >> > >> >> >> >> >
>> > >> > >> >> >> >> >
>> > >> > >> >> >> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li <
>> > >> lidavidm@apache.org
>> > >> > >
>> > >> > >> >> wrote:
>> > >> > >> >> >> >> >
>> > >> > >> >> >> >> > > You probably want Substrait: https://substrait.io/
>> > >> > >> >> >> >> > >
>> > >> > >> >> >> >> > > Which is being worked on by several people,
>> including
>> > >> Arrow
>> > >> > >> >> >> community
>> > >> > >> >> >> >> > > members.
>> > >> > >> >> >> >> > >
>> > >> > >> >> >> >> > > It might be interesting to generalize Flight SQL to
>> > >> include
>> > >> > >> >> >> support for
>> > >> > >> >> >> >> > > Substrait. I'm curious what your application, if
>> > you're
>> > >> > able
>> > >> > >> to
>> > >> > >> >> >> share
>> > >> > >> >> >> >> > more.
>> > >> > >> >> >> >> > >
>> > >> > >> >> >> >> > > -David
>> > >> > >> >> >> >> > >
>> > >> > >> >> >> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote:
>> > >> > >> >> >> >> > > > Hiya,
>> > >> > >> >> >> >> > > >
>> > >> > >> >> >> >> > > > I am drafting a proposal for a way to enable
>> > services
>> > >> to
>> > >> > >> >> express
>> > >> > >> >> >> data
>> > >> > >> >> >> >> > > > compute operations to each other.
>> > >> > >> >> >> >> > > >
>> > >> > >> >> >> >> > > > However I think it'll be difficult to get buy-in
>> if
>> > the
>> > >> > only
>> > >> > >> >> >> >> > > representation
>> > >> > >> >> >> >> > > > for queries is as SQL strings.
>> > >> > >> >> >> >> > > >
>> > >> > >> >> >> >> > > > Is there any kind of lower-level API that can be
>> > used
>> > >> to
>> > >> > >> >> express
>> > >> > >> >> >> >> > > operations?
>> > >> > >> >> >> >> > > >
>> > >> > >> >> >> >> > > > IE instead of "SELECT name FROM user"
>> > >> > >> >> >> >> > > >
>> > >> > >> >> >> >> > > > A structured representation like:
>> > >> > >> >> >> >> > > > {
>> > >> > >> >> >> >> > > >   "op": "query",
>> > >> > >> >> >> >> > > >   "schema": "user",
>> > >> > >> >> >> >> > > >   "project": ["name"]
>> > >> > >> >> >> >> > > > }
>> > >> > >> >> >> >> > > >
>> > >> > >> >> >> >> > > > Or maybe this is a bad idea/doesn't make sense?
>> > >> > >> >> >> >> > > >
>> > >> > >> >> >> >> > > > Thank you =)
>> > >> > >> >> >> >> > >
>> > >> > >> >> >> >> >
>> > >> > >> >> >> >>
>> > >> > >> >> >> >>
>> > >> > >> >> >> >> --
>> > >> > >> >> >> >>
>> > >> > >> >> >> >> *James Duong*
>> > >> > >> >> >> >> Lead Software Developer
>> > >> > >> >> >> >> Bit Quill Technologies Inc.
>> > >> > >> >> >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>> > >> > >> >> >> >> https://www.bitquilltech.com
>> > >> > >> >> >> >>
>> > >> > >> >> >> >> This email message is for the sole use of the intended
>> > >> > >> recipient(s)
>> > >> > >> >> >> and may
>> > >> > >> >> >> >> contain confidential and privileged information.  Any
>> > >> > unauthorized
>> > >> > >> >> >> review,
>> > >> > >> >> >> >> use, disclosure, or distribution is prohibited.  If you
>> > are
>> > >> not
>> > >> > >> the
>> > >> > >> >> >> >> intended recipient, please contact the sender by reply
>> > email
>> > >> > and
>> > >> > >> >> >> destroy
>> > >> > >> >> >> >> all copies of the original message.  Thank you.
>> > >> > >> >> >> >>
>> > >> > >> >> >>
>> > >> > >> >> >
>> > >> > >> >>
>> > >> > >> >> --
>> > >> > >> >>
>> > >> > >> >> *James Duong*
>> > >> > >> >> Lead Software Developer
>> > >> > >> >> Bit Quill Technologies Inc.
>> > >> > >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>> > >> > >> >> https://www.bitquilltech.com
>> > >> > >> >>
>> > >> > >> >> This email message is for the sole use of the intended
>> > recipient(s)
>> > >> > and
>> > >> > >> may
>> > >> > >> >> contain confidential and privileged information.  Any
>> > unauthorized
>> > >> > >> review,
>> > >> > >> >> use, disclosure, or distribution is prohibited.  If you are
>> not
>> > the
>> > >> > >> >> intended recipient, please contact the sender by reply email
>> and
>> > >> > destroy
>> > >> > >> >> all copies of the original message.  Thank you.
>> > >> > >> >>
>> > >> > >>
>> > >> >
>> > >>
>> >
>>
>>
>> --
>>
>> *James Duong*
>> Lead Software Developer
>> Bit Quill Technologies Inc.
>> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
>> https://www.bitquilltech.com
>>
>> This email message is for the sole use of the intended recipient(s) and may
>> contain confidential and privileged information.  Any unauthorized review,
>> use, disclosure, or distribution is prohibited.  If you are not the
>> intended recipient, please contact the sender by reply email and destroy
>> all copies of the original message.  Thank you.
>>

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

Posted by Micah Kornfield <em...@gmail.com>.
>
> It would also then be good to make explicit the statefulness of
> connections in Flight SQL. While that is sort of an obvious constraint, it
> is at odds with how gRPC is usually used (especially in the presence of
> load balancing).


I'm not sure I understand where the statefulness requirements come in?
Could you elaborate?  It seems that a transaction could be an opaque ID on
operations?

On Thu, Jun 30, 2022 at 2:47 PM James Duong <ja...@bitquilltech.com.invalid>
wrote:

> This is a bit of a tangent from the original discussion about
> Substrait integration.
>
> Flight SQL would definitely benefit from transaction RPC commands for
> building bridge drivers. I'm also wondering if there should be an RPC call
> to cancel a running query, as opposed to just having the client terminate
> streams. This would allow a multi-process application to cancel work across
> processes.
>
> On Thu, Jun 30, 2022 at 1:35 PM David Li <li...@apache.org> wrote:
>
> > Reviving this discussion: would people be interested in seeing a
> > sketched-out CommandSubstraitQuery et. al.?
> >
> > Additionally, while working on ADBC, I realized: does Flight SQL need
> > explicit Commit/Rollback commands? This would presumably be necessary if
> we
> > want to build ODBC/JDBC drivers on top, since those standards have
> explicit
> > commands, and Flight SQL doesn't have the luxury of a driver to issue
> > database-specific SQL to implement these.
> >
> > It would also then be good to make explicit the statefulness of
> > connections in Flight SQL. While that is sort of an obvious constraint,
> it
> > is at odds with how gRPC is usually used (especially in the presence of
> > load balancing).
> >
> > On Sun, Mar 6, 2022, at 14:44, Gavin Ray wrote:
> > > Got it, thank you David!
> > > I started prototyping the implementation last night, hopefully I will
> > make
> > > some good progress and have something basic functioning soon.
> > >
> > > RE: The metadata thing -- I think both Calcite and Teiid have solid
> > > interfaces for defining what capabilities a datasource has.
> > >
> >
> https://github.com/teiid/teiid/blob/8e9057a46be009d68b2d67701781f1f8c175baa7/api/src/main/java/org/teiid/translator/ExecutionFactory.java#L349-L1528
> > >
> > > It's probably not possible to make something universal, but it seems
> like
> > > you could get pretty close to most common functionality/capabilities
> > >
> > >
> > > On Sat, Mar 5, 2022 at 11:48 PM Kyle Porter <kylep@bitquilltech.com
> > .invalid>
> > > wrote:
> > >
> > >> Yes, we should, where possible, avoid any one of metadata. This is
> where
> > >> other standards fail in that applications must be custom built for
> each
> > >> data source, if we standardize the metadata then applications can at
> > least
> > >> be built to adapt.
> > >>
> > >> On Sat., Mar. 5, 2022, 6:54 p.m. David Li, <li...@apache.org>
> wrote:
> > >>
> > >> > Yes, GetSqlInfo reserves a range of metadata IDs for Flight SQL's
> > use, so
> > >> > the application can use others for its own purposes. That said if
> they
> > >> seem
> > >> > commonly applicable maybe we should try to standardize them.
> > >> >
> > >> > I think what you are doing should be reasonable. You may not need
> > _all_
> > >> of
> > >> > the capabilities in Flight SQL for this (e.g. all the various
> metadata
> > >> > calls, or prepared statements, perhaps) but I don't see why it
> > wouldn't
> > >> > work for you.
> > >> >
> > >> > On Fri, Mar 4, 2022, at 19:03, Gavin Ray wrote:
> > >> > > To touch on the question about supported features -- is it
> possible
> > to
> > >> > > advertise arbitrary/custom "capabilites" in GetSqlInfo?
> > >> > > Say that you want to represent some set of behaviors that
> FlightSQL
> > >> > > services can support.
> > >> > >
> > >> > > Stuff like "Supports grouping by multiple distinct aggregates",
> > >> "Supports
> > >> > > self-joins on aliased tables" etc
> > >> > > This is going to be unique to each implementation, but I couldn't
> > >> > determine
> > >> > > whether there was a way to express arbitrary capabilities
> > >> > >
> > >> > > Also, in case it's helpful I put together an ASCII diagram of what
> > I'm
> > >> > > trying to do with FlightSQL
> > >> > > If anyone has a moment, would appreciate input on whether it's
> > >> feasible/a
> > >> > > good idea
> > >> > >
> > >> > > https://pastebin.com/raw/VF2r0F3f
> > >> > >
> > >> > > Thank you =)
> > >> > >
> > >> > >
> > >> > > On Fri, Mar 4, 2022 at 2:37 PM David Li <li...@apache.org>
> > wrote:
> > >> > >
> > >> > >> We could also add say CommandSubstraitQuery as a distinct
> message,
> > and
> > >> > >> older servers would just reject it as an unknown request type.
> > >> > >>
> > >> > >> -David
> > >> > >>
> > >> > >> On Fri, Mar 4, 2022, at 17:01, Micah Kornfield wrote:
> > >> > >> >>
> > >> > >> >> 1. How does a server report that it supports each command
> type?
> > >> > Initial
> > >> > >> >> thought is a property in GetSqlInfo.
> > >> > >> >
> > >> > >> >
> > >> > >> > This sounds reasonable.
> > >> > >> >
> > >> > >> >
> > >> > >> >> What happens to client code written prior to changing the
> > command
> > >> > type
> > >> > >> >> to be a oneOf field? Same for servers.
> > >> > >> >
> > >> > >> >
> > >> > >> > It is transparent from older clients (I'm 99% sure the wire
> > protocol
> > >> > >> > doesn't change).  Servers is a little harder.  The one saving
> > grace
> > >> > is I
> > >> > >> > don't think an empty/not-present SQL string would be something
> > most
> > >> > >> servers
> > >> > >> > could handle, so they would probably error with something that
> > while
> > >> > >> > not-obvious would give a clue to the clients (but hopefully
> this
> > >> would
> > >> > >> be a
> > >> > >> > non-issue because the capabilities would be checked for clients
> > >> > wishing
> > >> > >> to
> > >> > >> > to use this feature first).
> > >> > >> >
> > >> > >> > -Micah
> > >> > >> >
> > >> > >> > On Fri, Mar 4, 2022 at 1:50 PM James Duong <
> > jamesd@bitquilltech.com
> > >> > >> .invalid>
> > >> > >> > wrote:
> > >> > >> >
> > >> > >> >> It sounds like an interesting and useful project to use
> > Subtstrait
> > >> > as an
> > >> > >> >> alternative to SQL strings.
> > >> > >> >>
> > >> > >> >> Important aspects to spec out are:
> > >> > >> >> 1. How does a server report that it supports each command
> type?
> > >> > Initial
> > >> > >> >> thought is a property in GetSqlInfo.
> > >> > >> >> 2. What happens to client code written prior to changing the
> > >> command
> > >> > >> type
> > >> > >> >> to be a oneOf field? Same for servers.
> > >> > >> >> More generally, how should backward compatibility work, and
> what
> > >> > should
> > >> > >> >> happen if a client sends an unsupported
> > >> > >> >> command type to a server.
> > >> > >> >> 3. Should inputs to catalog RPC calls also accept Substrait
> > >> > structures?
> > >> > >> >>
> > >> > >> >> On Thu, Mar 3, 2022 at 11:00 PM Gavin Ray <
> > ray.gavin97@gmail.com>
> > >> > >> wrote:
> > >> > >> >>
> > >> > >> >> > @James Duong <ja...@bitquilltech.com>
> > >> > >> >> >
> > >> > >> >> > You are absolutely right, I realized this and confirmed
> > whether
> > >> > this
> > >> > >> >> > would be possible with Jacques to double-check.
> > >> > >> >> > It would amount to what I might call "dollar-store
> Substrait."
> > >> It's
> > >> > >> not
> > >> > >> >> > elegant or a good solution, but definitely presents a good
> > >> > duct-tape
> > >> > >> hack
> > >> > >> >> > and is a crafty idea.
> > >> > >> >> >
> > >> > >> >> > I agree with Jacques -- when you think about FlightSQL, what
> > you
> > >> > are
> > >> > >> >> > attempting with a query isn't necessarily SQL, but a general
> > >> > >> data-compute
> > >> > >> >> > operation.
> > >> > >> >> > SQL just so happens to be a fairly universal way to express
> > them,
> > >> > >> with an
> > >> > >> >> > ANSI standard, but FlightSQL doesn't recognize any
> particular
> > >> > subset
> > >> > >> of
> > >> > >> >> it
> > >> > >> >> > and for all intents and purposes it doesn't matter what the
> > >> > operation
> > >> > >> >> > string contains.
> > >> > >> >> >
> > >> > >> >> > Substrait would make a fantastic logical next-feature
> because
> > >> it's
> > >> > >> >> > targeted as a specification for expressing relational
> algebra
> > and
> > >> > >> >> > data-compute operations
> > >> > >> >> > This more-or-less equates to SQL strings (in my mind at
> least)
> > >> > with a
> > >> > >> >> much
> > >> > >> >> > better toolkit and Dev UX. If there is anything I can do to
> > help
> > >> > move
> > >> > >> >> this
> > >> > >> >> > forward, please let me know because I am extremely motivated
> > to
> > >> do
> > >> > so.
> > >> > >> >> >
> > >> > >> >> > @David Li <gi...@lidavidm.me>
> > >> > >> >> >
> > >> > >> >> > Also agreed. Substrait is put together by folks much smarter
> > than
> > >> > >> myself,
> > >> > >> >> > and if I had to hedge my bets, I'd put money on it being the
> > >> > future of
> > >> > >> >> > data-compute interop.
> > >> > >> >> > I would love nothing more than to adopt this technology and
> > push
> > >> it
> > >> > >> >> along.
> > >> > >> >> >
> > >> > >> >> > Your project does sound interesting - basically, it sounds
> > like a
> > >> > >> tabular
> > >> > >> >> >> data storage service with query pushdown?
> > >> > >> >> >>
> > >> > >> >> >
> > >> > >> >> > Yeah this is more or less the details of it (my personal
> > email,
> > >> > with
> > >> > >> >> > discretion assumed, is always open)
> > >> > >> >> >
> > >> > >> >> > Imagine an environment where a backend wants to advertise
> some
> > >> > kind of
> > >> > >> >> > schema/data catalog
> > >> > >> >> >
> > >> > >> >> > And then a central service introspects these backends, and
> > >> > dynamically
> > >> > >> >> > generates an API from the data catalogues/schemas, where
> > requests
> > >> > get
> > >> > >> >> > proxied to the underlying backend service for each schema to
> > >> > actually
> > >> > >> be
> > >> > >> >> > executed
> > >> > >> >> >
> > >> > >> >> > In text, the flow would look something like:
> > >> > >> >> >
> > >> > >> >> >
> > >> > >> >> >        <----> Data Provider Backend 0
> > >> > >> >> > Client <-----> Central Service <---> Generated API <---->
> > >> > >> Data-Provider
> > >> > >> >> > Backend 1
> > >> > >> >> >
> > >> > >> >> >        <----> Data Provider Backend 2
> > >> > >> >> >
> > >> > >> >> >
> > >> > >> >> >
> > >> > >> >> > On Thu, Mar 3, 2022 at 5:52 PM David Li <
> lidavidm@apache.org>
> > >> > wrote:
> > >> > >> >> >
> > >> > >> >> >> Gavin, thanks for sharing. I'm not so sure you'll find an
> > >> > >> alternative to
> > >> > >> >> >> Substrait, at least one that isn't even more nascent or one
> > >> that's
> > >> > >> very
> > >> > >> >> >> tied to a particular language, so perhaps it might be
> better
> > to
> > >> > get
> > >> > >> >> >> involved in Substrait and see if it suits your needs?
> > >> Convincing a
> > >> > >> team
> > >> > >> >> to
> > >> > >> >> >> try something new can be hard, though, and it is somewhat
> of
> > a
> > >> > moving
> > >> > >> >> >> target - but Flight SQL is in a similar spot, I think, as
> > it's
> > >> > still
> > >> > >> >> >> getting enhancements.
> > >> > >> >> >>
> > >> > >> >> >> Your project does sound interesting - basically, it sounds
> > like
> > >> a
> > >> > >> >> tabular
> > >> > >> >> >> data storage service with query pushdown?
> > >> > >> >> >>
> > >> > >> >> >> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote:
> > >> > >> >> >> > James, I agree that you could use JSON but that feels a
> bit
> > >> > hacky
> > >> > >> >> >> > (mis-use
> > >> > >> >> >> > of the paradigm). Instead, I'd really like to do
> something
> > >> like
> > >> > >> David
> > >> > >> >> is
> > >> > >> >> >> > suggesting: support Substrait as an alternative to a SQL
> > >> string.
> > >> > >> >> >> > Something like this:
> > >> > >> >> >> >
> > >> > >> >> >>
> > >> > >> >>
> > >> > >>
> > >> >
> > >>
> >
> https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc
> > >> > >> >> >> >
> > >> > >> >> >> > It would be great if someone wanted to pick this up. It
> > would
> > >> > be a
> > >> > >> >> nice
> > >> > >> >> >> > enhancement to FlightSQL (and provide a structured way to
> > >> > express
> > >> > >> >> >> > operations).
> > >> > >> >> >> >
> > >> > >> >> >> >
> > >> > >> >> >> >
> > >> > >> >> >> > On Thu, Mar 3, 2022 at 4:56 PM James Duong <
> > >> > >> jamesd@bitquilltech.com
> > >> > >> >> >> .invalid>
> > >> > >> >> >> > wrote:
> > >> > >> >> >> >
> > >> > >> >> >> >> In the same way that you could write an ODBC driver that
> > >> takes
> > >> > in
> > >> > >> >> text
> > >> > >> >> >> >> that's not SQL, you could write a Flight SQL server that
> > >> takes
> > >> > in
> > >> > >> >> text
> > >> > >> >> >> >> that's JSON.
> > >> > >> >> >> >> Flight SQL doesn't parse the query, so you could create
> > >> > commands
> > >> > >> that
> > >> > >> >> >> are
> > >> > >> >> >> >> just JSON text.
> > >> > >> >> >> >>
> > >> > >> >> >> >> Is that the only bit you need, Gavin?
> > >> > >> >> >> >>
> > >> > >> >> >> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray <
> > >> > ray.gavin97@gmail.com>
> > >> > >> >> >> wrote:
> > >> > >> >> >> >>
> > >> > >> >> >> >> > I am enthusiastic about Substrait and have followed
> it's
> > >> > >> progress
> > >> > >> >> >> eagerly
> > >> > >> >> >> >> > =D
> > >> > >> >> >> >> >
> > >> > >> >> >> >> > When I presented it as a tentative option, there were
> > >> > >> reservations
> > >> > >> >> >> >> because
> > >> > >> >> >> >> > of the project/spec being young and the functionality
> > still
> > >> > >> being
> > >> > >> >> >> >> > fleshed out.
> > >> > >> >> >> >> > I think if I were having this conversation in say,
> 8-16
> > >> > months,
> > >> > >> it
> > >> > >> >> >> would
> > >> > >> >> >> >> > have been an easy choice, no doubt.
> > >> > >> >> >> >> >
> > >> > >> >> >> >> > On a public mailing list (and I can share more details
> > in
> > >> > >> private
> > >> > >> >> if
> > >> > >> >> >> >> you're
> > >> > >> >> >> >> > curious), the gist of it is this:
> > >> > >> >> >> >> >
> > >> > >> >> >> >> > Some well-defined/backed-by-mature tech solution for
> > >> > expressing
> > >> > >> >> data
> > >> > >> >> >> >> > compute operations between services would be a useful
> > thing
> > >> > to
> > >> > >> have
> > >> > >> >> >> >> > (Especially if it's language-agnostic)
> > >> > >> >> >> >> >
> > >> > >> >> >> >> > The goal is for an "implementing service" to have:
> > >> > >> >> >> >> > - An introspectable schema (IE, "describe yourself to
> > me")
> > >> > >> >> >> >> > - A query/operation execution endpoint (IE: "perform
> > this
> > >> > >> operation
> > >> > >> >> >> on
> > >> > >> >> >> >> your
> > >> > >> >> >> >> > data")
> > >> > >> >> >> >> >
> > >> > >> >> >> >> > With FlightSQL this is possible I believe, but it
> > requires
> > >> > the
> > >> > >> >> >> operation
> > >> > >> >> >> >> to
> > >> > >> >> >> >> > be expressed as a SQL string which isn't ideal.
> > >> > >> >> >> >> >
> > >> > >> >> >> >> > Working with some programmatic, structured object that
> > has
> > >> > the
> > >> > >> same
> > >> > >> >> >> >> > semantics ("Logical Plan", or whatnot) as a SQL query
> > would
> > >> > >> have,
> > >> > >> >> >> would
> > >> > >> >> >> >> be
> > >> > >> >> >> >> > a better experience
> > >> > >> >> >> >> > (Jacques is on to something here!)
> > >> > >> >> >> >> >
> > >> > >> >> >> >> > This interface between services would be somewhat the
> > >> > >> equivalent of
> > >> > >> >> >> an
> > >> > >> >> >> >> > "SDK", so it would be nice to have a strongly-typed
> > library
> > >> > for
> > >> > >> >> >> >> expressing
> > >> > >> >> >> >> > and building-up query/data-compute ops.
> > >> > >> >> >> >> >
> > >> > >> >> >> >> >
> > >> > >> >> >> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li <
> > >> lidavidm@apache.org
> > >> > >
> > >> > >> >> wrote:
> > >> > >> >> >> >> >
> > >> > >> >> >> >> > > You probably want Substrait: https://substrait.io/
> > >> > >> >> >> >> > >
> > >> > >> >> >> >> > > Which is being worked on by several people,
> including
> > >> Arrow
> > >> > >> >> >> community
> > >> > >> >> >> >> > > members.
> > >> > >> >> >> >> > >
> > >> > >> >> >> >> > > It might be interesting to generalize Flight SQL to
> > >> include
> > >> > >> >> >> support for
> > >> > >> >> >> >> > > Substrait. I'm curious what your application, if
> > you're
> > >> > able
> > >> > >> to
> > >> > >> >> >> share
> > >> > >> >> >> >> > more.
> > >> > >> >> >> >> > >
> > >> > >> >> >> >> > > -David
> > >> > >> >> >> >> > >
> > >> > >> >> >> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote:
> > >> > >> >> >> >> > > > Hiya,
> > >> > >> >> >> >> > > >
> > >> > >> >> >> >> > > > I am drafting a proposal for a way to enable
> > services
> > >> to
> > >> > >> >> express
> > >> > >> >> >> data
> > >> > >> >> >> >> > > > compute operations to each other.
> > >> > >> >> >> >> > > >
> > >> > >> >> >> >> > > > However I think it'll be difficult to get buy-in
> if
> > the
> > >> > only
> > >> > >> >> >> >> > > representation
> > >> > >> >> >> >> > > > for queries is as SQL strings.
> > >> > >> >> >> >> > > >
> > >> > >> >> >> >> > > > Is there any kind of lower-level API that can be
> > used
> > >> to
> > >> > >> >> express
> > >> > >> >> >> >> > > operations?
> > >> > >> >> >> >> > > >
> > >> > >> >> >> >> > > > IE instead of "SELECT name FROM user"
> > >> > >> >> >> >> > > >
> > >> > >> >> >> >> > > > A structured representation like:
> > >> > >> >> >> >> > > > {
> > >> > >> >> >> >> > > >   "op": "query",
> > >> > >> >> >> >> > > >   "schema": "user",
> > >> > >> >> >> >> > > >   "project": ["name"]
> > >> > >> >> >> >> > > > }
> > >> > >> >> >> >> > > >
> > >> > >> >> >> >> > > > Or maybe this is a bad idea/doesn't make sense?
> > >> > >> >> >> >> > > >
> > >> > >> >> >> >> > > > Thank you =)
> > >> > >> >> >> >> > >
> > >> > >> >> >> >> >
> > >> > >> >> >> >>
> > >> > >> >> >> >>
> > >> > >> >> >> >> --
> > >> > >> >> >> >>
> > >> > >> >> >> >> *James Duong*
> > >> > >> >> >> >> Lead Software Developer
> > >> > >> >> >> >> Bit Quill Technologies Inc.
> > >> > >> >> >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> > >> > >> >> >> >> https://www.bitquilltech.com
> > >> > >> >> >> >>
> > >> > >> >> >> >> This email message is for the sole use of the intended
> > >> > >> recipient(s)
> > >> > >> >> >> and may
> > >> > >> >> >> >> contain confidential and privileged information.  Any
> > >> > unauthorized
> > >> > >> >> >> review,
> > >> > >> >> >> >> use, disclosure, or distribution is prohibited.  If you
> > are
> > >> not
> > >> > >> the
> > >> > >> >> >> >> intended recipient, please contact the sender by reply
> > email
> > >> > and
> > >> > >> >> >> destroy
> > >> > >> >> >> >> all copies of the original message.  Thank you.
> > >> > >> >> >> >>
> > >> > >> >> >>
> > >> > >> >> >
> > >> > >> >>
> > >> > >> >> --
> > >> > >> >>
> > >> > >> >> *James Duong*
> > >> > >> >> Lead Software Developer
> > >> > >> >> Bit Quill Technologies Inc.
> > >> > >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> > >> > >> >> https://www.bitquilltech.com
> > >> > >> >>
> > >> > >> >> This email message is for the sole use of the intended
> > recipient(s)
> > >> > and
> > >> > >> may
> > >> > >> >> contain confidential and privileged information.  Any
> > unauthorized
> > >> > >> review,
> > >> > >> >> use, disclosure, or distribution is prohibited.  If you are
> not
> > the
> > >> > >> >> intended recipient, please contact the sender by reply email
> and
> > >> > destroy
> > >> > >> >> all copies of the original message.  Thank you.
> > >> > >> >>
> > >> > >>
> > >> >
> > >>
> >
>
>
> --
>
> *James Duong*
> Lead Software Developer
> Bit Quill Technologies Inc.
> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> https://www.bitquilltech.com
>
> This email message is for the sole use of the intended recipient(s) and may
> contain confidential and privileged information.  Any unauthorized review,
> use, disclosure, or distribution is prohibited.  If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message.  Thank you.
>

Re: [FlightSQL] Structured/Serialized representation of query (like JSON) rather than SQL string possible?

Posted by James Duong <ja...@bitquilltech.com.INVALID>.
This is a bit of a tangent from the original discussion about
Substrait integration.

Flight SQL would definitely benefit from transaction RPC commands for
building bridge drivers. I'm also wondering if there should be an RPC call
to cancel a running query, as opposed to just having the client terminate
streams. This would allow a multi-process application to cancel work across
processes.

On Thu, Jun 30, 2022 at 1:35 PM David Li <li...@apache.org> wrote:

> Reviving this discussion: would people be interested in seeing a
> sketched-out CommandSubstraitQuery et. al.?
>
> Additionally, while working on ADBC, I realized: does Flight SQL need
> explicit Commit/Rollback commands? This would presumably be necessary if we
> want to build ODBC/JDBC drivers on top, since those standards have explicit
> commands, and Flight SQL doesn't have the luxury of a driver to issue
> database-specific SQL to implement these.
>
> It would also then be good to make explicit the statefulness of
> connections in Flight SQL. While that is sort of an obvious constraint, it
> is at odds with how gRPC is usually used (especially in the presence of
> load balancing).
>
> On Sun, Mar 6, 2022, at 14:44, Gavin Ray wrote:
> > Got it, thank you David!
> > I started prototyping the implementation last night, hopefully I will
> make
> > some good progress and have something basic functioning soon.
> >
> > RE: The metadata thing -- I think both Calcite and Teiid have solid
> > interfaces for defining what capabilities a datasource has.
> >
> https://github.com/teiid/teiid/blob/8e9057a46be009d68b2d67701781f1f8c175baa7/api/src/main/java/org/teiid/translator/ExecutionFactory.java#L349-L1528
> >
> > It's probably not possible to make something universal, but it seems like
> > you could get pretty close to most common functionality/capabilities
> >
> >
> > On Sat, Mar 5, 2022 at 11:48 PM Kyle Porter <kylep@bitquilltech.com
> .invalid>
> > wrote:
> >
> >> Yes, we should, where possible, avoid any one of metadata. This is where
> >> other standards fail in that applications must be custom built for each
> >> data source, if we standardize the metadata then applications can at
> least
> >> be built to adapt.
> >>
> >> On Sat., Mar. 5, 2022, 6:54 p.m. David Li, <li...@apache.org> wrote:
> >>
> >> > Yes, GetSqlInfo reserves a range of metadata IDs for Flight SQL's
> use, so
> >> > the application can use others for its own purposes. That said if they
> >> seem
> >> > commonly applicable maybe we should try to standardize them.
> >> >
> >> > I think what you are doing should be reasonable. You may not need
> _all_
> >> of
> >> > the capabilities in Flight SQL for this (e.g. all the various metadata
> >> > calls, or prepared statements, perhaps) but I don't see why it
> wouldn't
> >> > work for you.
> >> >
> >> > On Fri, Mar 4, 2022, at 19:03, Gavin Ray wrote:
> >> > > To touch on the question about supported features -- is it possible
> to
> >> > > advertise arbitrary/custom "capabilites" in GetSqlInfo?
> >> > > Say that you want to represent some set of behaviors that FlightSQL
> >> > > services can support.
> >> > >
> >> > > Stuff like "Supports grouping by multiple distinct aggregates",
> >> "Supports
> >> > > self-joins on aliased tables" etc
> >> > > This is going to be unique to each implementation, but I couldn't
> >> > determine
> >> > > whether there was a way to express arbitrary capabilities
> >> > >
> >> > > Also, in case it's helpful I put together an ASCII diagram of what
> I'm
> >> > > trying to do with FlightSQL
> >> > > If anyone has a moment, would appreciate input on whether it's
> >> feasible/a
> >> > > good idea
> >> > >
> >> > > https://pastebin.com/raw/VF2r0F3f
> >> > >
> >> > > Thank you =)
> >> > >
> >> > >
> >> > > On Fri, Mar 4, 2022 at 2:37 PM David Li <li...@apache.org>
> wrote:
> >> > >
> >> > >> We could also add say CommandSubstraitQuery as a distinct message,
> and
> >> > >> older servers would just reject it as an unknown request type.
> >> > >>
> >> > >> -David
> >> > >>
> >> > >> On Fri, Mar 4, 2022, at 17:01, Micah Kornfield wrote:
> >> > >> >>
> >> > >> >> 1. How does a server report that it supports each command type?
> >> > Initial
> >> > >> >> thought is a property in GetSqlInfo.
> >> > >> >
> >> > >> >
> >> > >> > This sounds reasonable.
> >> > >> >
> >> > >> >
> >> > >> >> What happens to client code written prior to changing the
> command
> >> > type
> >> > >> >> to be a oneOf field? Same for servers.
> >> > >> >
> >> > >> >
> >> > >> > It is transparent from older clients (I'm 99% sure the wire
> protocol
> >> > >> > doesn't change).  Servers is a little harder.  The one saving
> grace
> >> > is I
> >> > >> > don't think an empty/not-present SQL string would be something
> most
> >> > >> servers
> >> > >> > could handle, so they would probably error with something that
> while
> >> > >> > not-obvious would give a clue to the clients (but hopefully this
> >> would
> >> > >> be a
> >> > >> > non-issue because the capabilities would be checked for clients
> >> > wishing
> >> > >> to
> >> > >> > to use this feature first).
> >> > >> >
> >> > >> > -Micah
> >> > >> >
> >> > >> > On Fri, Mar 4, 2022 at 1:50 PM James Duong <
> jamesd@bitquilltech.com
> >> > >> .invalid>
> >> > >> > wrote:
> >> > >> >
> >> > >> >> It sounds like an interesting and useful project to use
> Subtstrait
> >> > as an
> >> > >> >> alternative to SQL strings.
> >> > >> >>
> >> > >> >> Important aspects to spec out are:
> >> > >> >> 1. How does a server report that it supports each command type?
> >> > Initial
> >> > >> >> thought is a property in GetSqlInfo.
> >> > >> >> 2. What happens to client code written prior to changing the
> >> command
> >> > >> type
> >> > >> >> to be a oneOf field? Same for servers.
> >> > >> >> More generally, how should backward compatibility work, and what
> >> > should
> >> > >> >> happen if a client sends an unsupported
> >> > >> >> command type to a server.
> >> > >> >> 3. Should inputs to catalog RPC calls also accept Substrait
> >> > structures?
> >> > >> >>
> >> > >> >> On Thu, Mar 3, 2022 at 11:00 PM Gavin Ray <
> ray.gavin97@gmail.com>
> >> > >> wrote:
> >> > >> >>
> >> > >> >> > @James Duong <ja...@bitquilltech.com>
> >> > >> >> >
> >> > >> >> > You are absolutely right, I realized this and confirmed
> whether
> >> > this
> >> > >> >> > would be possible with Jacques to double-check.
> >> > >> >> > It would amount to what I might call "dollar-store Substrait."
> >> It's
> >> > >> not
> >> > >> >> > elegant or a good solution, but definitely presents a good
> >> > duct-tape
> >> > >> hack
> >> > >> >> > and is a crafty idea.
> >> > >> >> >
> >> > >> >> > I agree with Jacques -- when you think about FlightSQL, what
> you
> >> > are
> >> > >> >> > attempting with a query isn't necessarily SQL, but a general
> >> > >> data-compute
> >> > >> >> > operation.
> >> > >> >> > SQL just so happens to be a fairly universal way to express
> them,
> >> > >> with an
> >> > >> >> > ANSI standard, but FlightSQL doesn't recognize any particular
> >> > subset
> >> > >> of
> >> > >> >> it
> >> > >> >> > and for all intents and purposes it doesn't matter what the
> >> > operation
> >> > >> >> > string contains.
> >> > >> >> >
> >> > >> >> > Substrait would make a fantastic logical next-feature because
> >> it's
> >> > >> >> > targeted as a specification for expressing relational algebra
> and
> >> > >> >> > data-compute operations
> >> > >> >> > This more-or-less equates to SQL strings (in my mind at least)
> >> > with a
> >> > >> >> much
> >> > >> >> > better toolkit and Dev UX. If there is anything I can do to
> help
> >> > move
> >> > >> >> this
> >> > >> >> > forward, please let me know because I am extremely motivated
> to
> >> do
> >> > so.
> >> > >> >> >
> >> > >> >> > @David Li <gi...@lidavidm.me>
> >> > >> >> >
> >> > >> >> > Also agreed. Substrait is put together by folks much smarter
> than
> >> > >> myself,
> >> > >> >> > and if I had to hedge my bets, I'd put money on it being the
> >> > future of
> >> > >> >> > data-compute interop.
> >> > >> >> > I would love nothing more than to adopt this technology and
> push
> >> it
> >> > >> >> along.
> >> > >> >> >
> >> > >> >> > Your project does sound interesting - basically, it sounds
> like a
> >> > >> tabular
> >> > >> >> >> data storage service with query pushdown?
> >> > >> >> >>
> >> > >> >> >
> >> > >> >> > Yeah this is more or less the details of it (my personal
> email,
> >> > with
> >> > >> >> > discretion assumed, is always open)
> >> > >> >> >
> >> > >> >> > Imagine an environment where a backend wants to advertise some
> >> > kind of
> >> > >> >> > schema/data catalog
> >> > >> >> >
> >> > >> >> > And then a central service introspects these backends, and
> >> > dynamically
> >> > >> >> > generates an API from the data catalogues/schemas, where
> requests
> >> > get
> >> > >> >> > proxied to the underlying backend service for each schema to
> >> > actually
> >> > >> be
> >> > >> >> > executed
> >> > >> >> >
> >> > >> >> > In text, the flow would look something like:
> >> > >> >> >
> >> > >> >> >
> >> > >> >> >        <----> Data Provider Backend 0
> >> > >> >> > Client <-----> Central Service <---> Generated API <---->
> >> > >> Data-Provider
> >> > >> >> > Backend 1
> >> > >> >> >
> >> > >> >> >        <----> Data Provider Backend 2
> >> > >> >> >
> >> > >> >> >
> >> > >> >> >
> >> > >> >> > On Thu, Mar 3, 2022 at 5:52 PM David Li <li...@apache.org>
> >> > wrote:
> >> > >> >> >
> >> > >> >> >> Gavin, thanks for sharing. I'm not so sure you'll find an
> >> > >> alternative to
> >> > >> >> >> Substrait, at least one that isn't even more nascent or one
> >> that's
> >> > >> very
> >> > >> >> >> tied to a particular language, so perhaps it might be better
> to
> >> > get
> >> > >> >> >> involved in Substrait and see if it suits your needs?
> >> Convincing a
> >> > >> team
> >> > >> >> to
> >> > >> >> >> try something new can be hard, though, and it is somewhat of
> a
> >> > moving
> >> > >> >> >> target - but Flight SQL is in a similar spot, I think, as
> it's
> >> > still
> >> > >> >> >> getting enhancements.
> >> > >> >> >>
> >> > >> >> >> Your project does sound interesting - basically, it sounds
> like
> >> a
> >> > >> >> tabular
> >> > >> >> >> data storage service with query pushdown?
> >> > >> >> >>
> >> > >> >> >> On Thu, Mar 3, 2022, at 19:58, Jacques Nadeau wrote:
> >> > >> >> >> > James, I agree that you could use JSON but that feels a bit
> >> > hacky
> >> > >> >> >> > (mis-use
> >> > >> >> >> > of the paradigm). Instead, I'd really like to do something
> >> like
> >> > >> David
> >> > >> >> is
> >> > >> >> >> > suggesting: support Substrait as an alternative to a SQL
> >> string.
> >> > >> >> >> > Something like this:
> >> > >> >> >> >
> >> > >> >> >>
> >> > >> >>
> >> > >>
> >> >
> >>
> https://github.com/jacques-n/arrow/commit/e22674fa882e77c2889cf95f69f6e3701db362bc
> >> > >> >> >> >
> >> > >> >> >> > It would be great if someone wanted to pick this up. It
> would
> >> > be a
> >> > >> >> nice
> >> > >> >> >> > enhancement to FlightSQL (and provide a structured way to
> >> > express
> >> > >> >> >> > operations).
> >> > >> >> >> >
> >> > >> >> >> >
> >> > >> >> >> >
> >> > >> >> >> > On Thu, Mar 3, 2022 at 4:56 PM James Duong <
> >> > >> jamesd@bitquilltech.com
> >> > >> >> >> .invalid>
> >> > >> >> >> > wrote:
> >> > >> >> >> >
> >> > >> >> >> >> In the same way that you could write an ODBC driver that
> >> takes
> >> > in
> >> > >> >> text
> >> > >> >> >> >> that's not SQL, you could write a Flight SQL server that
> >> takes
> >> > in
> >> > >> >> text
> >> > >> >> >> >> that's JSON.
> >> > >> >> >> >> Flight SQL doesn't parse the query, so you could create
> >> > commands
> >> > >> that
> >> > >> >> >> are
> >> > >> >> >> >> just JSON text.
> >> > >> >> >> >>
> >> > >> >> >> >> Is that the only bit you need, Gavin?
> >> > >> >> >> >>
> >> > >> >> >> >> On Thu, Mar 3, 2022 at 4:26 PM Gavin Ray <
> >> > ray.gavin97@gmail.com>
> >> > >> >> >> wrote:
> >> > >> >> >> >>
> >> > >> >> >> >> > I am enthusiastic about Substrait and have followed it's
> >> > >> progress
> >> > >> >> >> eagerly
> >> > >> >> >> >> > =D
> >> > >> >> >> >> >
> >> > >> >> >> >> > When I presented it as a tentative option, there were
> >> > >> reservations
> >> > >> >> >> >> because
> >> > >> >> >> >> > of the project/spec being young and the functionality
> still
> >> > >> being
> >> > >> >> >> >> > fleshed out.
> >> > >> >> >> >> > I think if I were having this conversation in say, 8-16
> >> > months,
> >> > >> it
> >> > >> >> >> would
> >> > >> >> >> >> > have been an easy choice, no doubt.
> >> > >> >> >> >> >
> >> > >> >> >> >> > On a public mailing list (and I can share more details
> in
> >> > >> private
> >> > >> >> if
> >> > >> >> >> >> you're
> >> > >> >> >> >> > curious), the gist of it is this:
> >> > >> >> >> >> >
> >> > >> >> >> >> > Some well-defined/backed-by-mature tech solution for
> >> > expressing
> >> > >> >> data
> >> > >> >> >> >> > compute operations between services would be a useful
> thing
> >> > to
> >> > >> have
> >> > >> >> >> >> > (Especially if it's language-agnostic)
> >> > >> >> >> >> >
> >> > >> >> >> >> > The goal is for an "implementing service" to have:
> >> > >> >> >> >> > - An introspectable schema (IE, "describe yourself to
> me")
> >> > >> >> >> >> > - A query/operation execution endpoint (IE: "perform
> this
> >> > >> operation
> >> > >> >> >> on
> >> > >> >> >> >> your
> >> > >> >> >> >> > data")
> >> > >> >> >> >> >
> >> > >> >> >> >> > With FlightSQL this is possible I believe, but it
> requires
> >> > the
> >> > >> >> >> operation
> >> > >> >> >> >> to
> >> > >> >> >> >> > be expressed as a SQL string which isn't ideal.
> >> > >> >> >> >> >
> >> > >> >> >> >> > Working with some programmatic, structured object that
> has
> >> > the
> >> > >> same
> >> > >> >> >> >> > semantics ("Logical Plan", or whatnot) as a SQL query
> would
> >> > >> have,
> >> > >> >> >> would
> >> > >> >> >> >> be
> >> > >> >> >> >> > a better experience
> >> > >> >> >> >> > (Jacques is on to something here!)
> >> > >> >> >> >> >
> >> > >> >> >> >> > This interface between services would be somewhat the
> >> > >> equivalent of
> >> > >> >> >> an
> >> > >> >> >> >> > "SDK", so it would be nice to have a strongly-typed
> library
> >> > for
> >> > >> >> >> >> expressing
> >> > >> >> >> >> > and building-up query/data-compute ops.
> >> > >> >> >> >> >
> >> > >> >> >> >> >
> >> > >> >> >> >> > On Thu, Mar 3, 2022 at 3:17 PM David Li <
> >> lidavidm@apache.org
> >> > >
> >> > >> >> wrote:
> >> > >> >> >> >> >
> >> > >> >> >> >> > > You probably want Substrait: https://substrait.io/
> >> > >> >> >> >> > >
> >> > >> >> >> >> > > Which is being worked on by several people, including
> >> Arrow
> >> > >> >> >> community
> >> > >> >> >> >> > > members.
> >> > >> >> >> >> > >
> >> > >> >> >> >> > > It might be interesting to generalize Flight SQL to
> >> include
> >> > >> >> >> support for
> >> > >> >> >> >> > > Substrait. I'm curious what your application, if
> you're
> >> > able
> >> > >> to
> >> > >> >> >> share
> >> > >> >> >> >> > more.
> >> > >> >> >> >> > >
> >> > >> >> >> >> > > -David
> >> > >> >> >> >> > >
> >> > >> >> >> >> > > On Thu, Mar 3, 2022, at 18:05, Gavin Ray wrote:
> >> > >> >> >> >> > > > Hiya,
> >> > >> >> >> >> > > >
> >> > >> >> >> >> > > > I am drafting a proposal for a way to enable
> services
> >> to
> >> > >> >> express
> >> > >> >> >> data
> >> > >> >> >> >> > > > compute operations to each other.
> >> > >> >> >> >> > > >
> >> > >> >> >> >> > > > However I think it'll be difficult to get buy-in if
> the
> >> > only
> >> > >> >> >> >> > > representation
> >> > >> >> >> >> > > > for queries is as SQL strings.
> >> > >> >> >> >> > > >
> >> > >> >> >> >> > > > Is there any kind of lower-level API that can be
> used
> >> to
> >> > >> >> express
> >> > >> >> >> >> > > operations?
> >> > >> >> >> >> > > >
> >> > >> >> >> >> > > > IE instead of "SELECT name FROM user"
> >> > >> >> >> >> > > >
> >> > >> >> >> >> > > > A structured representation like:
> >> > >> >> >> >> > > > {
> >> > >> >> >> >> > > >   "op": "query",
> >> > >> >> >> >> > > >   "schema": "user",
> >> > >> >> >> >> > > >   "project": ["name"]
> >> > >> >> >> >> > > > }
> >> > >> >> >> >> > > >
> >> > >> >> >> >> > > > Or maybe this is a bad idea/doesn't make sense?
> >> > >> >> >> >> > > >
> >> > >> >> >> >> > > > Thank you =)
> >> > >> >> >> >> > >
> >> > >> >> >> >> >
> >> > >> >> >> >>
> >> > >> >> >> >>
> >> > >> >> >> >> --
> >> > >> >> >> >>
> >> > >> >> >> >> *James Duong*
> >> > >> >> >> >> Lead Software Developer
> >> > >> >> >> >> Bit Quill Technologies Inc.
> >> > >> >> >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> >> > >> >> >> >> https://www.bitquilltech.com
> >> > >> >> >> >>
> >> > >> >> >> >> This email message is for the sole use of the intended
> >> > >> recipient(s)
> >> > >> >> >> and may
> >> > >> >> >> >> contain confidential and privileged information.  Any
> >> > unauthorized
> >> > >> >> >> review,
> >> > >> >> >> >> use, disclosure, or distribution is prohibited.  If you
> are
> >> not
> >> > >> the
> >> > >> >> >> >> intended recipient, please contact the sender by reply
> email
> >> > and
> >> > >> >> >> destroy
> >> > >> >> >> >> all copies of the original message.  Thank you.
> >> > >> >> >> >>
> >> > >> >> >>
> >> > >> >> >
> >> > >> >>
> >> > >> >> --
> >> > >> >>
> >> > >> >> *James Duong*
> >> > >> >> Lead Software Developer
> >> > >> >> Bit Quill Technologies Inc.
> >> > >> >> Direct: +1.604.562.6082 | jamesd@bitquilltech.com
> >> > >> >> https://www.bitquilltech.com
> >> > >> >>
> >> > >> >> This email message is for the sole use of the intended
> recipient(s)
> >> > and
> >> > >> may
> >> > >> >> contain confidential and privileged information.  Any
> unauthorized
> >> > >> review,
> >> > >> >> use, disclosure, or distribution is prohibited.  If you are not
> the
> >> > >> >> intended recipient, please contact the sender by reply email and
> >> > destroy
> >> > >> >> all copies of the original message.  Thank you.
> >> > >> >>
> >> > >>
> >> >
> >>
>


-- 

*James Duong*
Lead Software Developer
Bit Quill Technologies Inc.
Direct: +1.604.562.6082 | jamesd@bitquilltech.com
https://www.bitquilltech.com

This email message is for the sole use of the intended recipient(s) and may
contain confidential and privileged information.  Any unauthorized review,
use, disclosure, or distribution is prohibited.  If you are not the
intended recipient, please contact the sender by reply email and destroy
all copies of the original message.  Thank you.