You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Will Jones <wi...@gmail.com> on 2022/12/24 19:09:28 UTC

ADBC design question from delta-rs

Hello,

In the delta-rs project, we are looking at creating an ADBC driver for
reading and writing Delta Lake tables [1]. This includes bulk insertion
operations that are more complex than simply appending rows. For example,
the merge operation [2] performs an upsert that will take both an input
stream of data *and* a SQL/Substrait query specifying the update predicate
and update behavior.

There is some description of a bulk insert in the existing headers [3], but
it presents binding a statement and binding a the source data as mutually
exclusive. Is that intentional? Or is it considered valid to bind both?

The docstring for AdbcStatementBindStream [4] mentioned prepared
statements. Is there an example of using it with one?

Best,

Will Jones

[1]
https://docs.google.com/document/d/1ud-iBPg8VVz2N3HxySz9qbrffw6a9I7TiGZJ2MBs7ZE/edit?usp=sharing
[2]
https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge
[3]
https://github.com/apache/arrow-adbc/blob/728188c15a8a425d9feff349ed5fc9fd579f7a14/adbc.h#L434-L442
[4]
https://github.com/apache/arrow-adbc/blob/728188c15a8a425d9feff349ed5fc9fd579f7a14/adbc.h#L1113-L1124

Re: ADBC design question from delta-rs

Posted by David Li <li...@apache.org>.
You can set the query and bind a batch or a stream of data. The bulk insert is separate because it is meant to be database agnostic, and so it doesn't make sense to specify a query there. (But it would be equivalent to you just using an INSERT and binding the data.) If you want to specify the query, it's just a regular query with bound data (which is always a batch or stream and not a row).

On Sat, Dec 24, 2022, at 13:09, Will Jones wrote:
> Hello,
> 
> In the delta-rs project, we are looking at creating an ADBC driver for reading and writing Delta Lake tables [1]. This includes bulk insertion operations that are more complex than simply appending rows. For example, the merge operation [2] performs an upsert that will take both an input stream of data *and* a SQL/Substrait query specifying the update predicate and update behavior.
> 
> There is some description of a bulk insert in the existing headers [3], but it presents binding a statement and binding a the source data as mutually exclusive. Is that intentional? Or is it considered valid to bind both?
> 
> The docstring for AdbcStatementBindStream [4] mentioned prepared statements. Is there an example of using it with one?
> 
> Best,
> 
> Will Jones
> 
> [1] https://docs.google.com/document/d/1ud-iBPg8VVz2N3HxySz9qbrffw6a9I7TiGZJ2MBs7ZE/edit?usp=sharing
> [2] https://docs.delta.io/latest/delta-update.html#upsert-into-a-table-using-merge
> [3] https://github.com/apache/arrow-adbc/blob/728188c15a8a425d9feff349ed5fc9fd579f7a14/adbc.h#L434-L442
> [4] https://github.com/apache/arrow-adbc/blob/728188c15a8a425d9feff349ed5fc9fd579f7a14/adbc.h#L1113-L1124
>