You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "lidavidm (via GitHub)" <gi...@apache.org> on 2023/06/12 17:33:31 UTC

[GitHub] [arrow-adbc] lidavidm opened a new pull request, #765: format: add additional features to 1.1.0 spec

lidavidm opened a new pull request, #765:
URL: https://github.com/apache/arrow-adbc/pull/765

   Intends to tackle #736 and #755.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238599646


##########
adbc.h:
##########
@@ -1095,6 +1256,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |

Review Comment:
   Both input and output arguments. JDBC uses the 'columns' nomenclature. (We could try to separate out input/output, but JDBC also allows for an 'unknown' column type so you can't always tell.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] zeroshade commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.

zeroshade commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1235552960


##########
adbc.h:
##########
@@ -1380,6 +1786,66 @@ AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection,
                                            struct ArrowArrayStream* out,
                                            struct AdbcError* error);
 
+/// \brief Get a view of types supported by the database.
+///
+/// The result is an Arrow schema with one field per type.  This represents
+/// the driver's mapping of the database type to an Arrow type.

Review Comment:
   Should we clarify that the name of each field should be the database specific name of the type while the field's Type would be the mapped Arrow type?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238922874


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   `getenv` is slightly different as global state where the application may not know what libraries it uses are also doing with the global state; here the operation is scoped to a specific handle that presumably an application can manage itself



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238509215


##########
adbc.h:
##########
@@ -1095,6 +1237,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |

Review Comment:
   Building and traversing these structures is a bit verbose and easy to mess up. Also this is all via Nanoarrow which of course is a more minimal API. I've been prototyping a code generator (generating iterators/builders for fixed schemas, sort of like Protobuf) as a possible solution but it isn't anywhere near ready yet. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238916929


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   I was thinking it's not super useful to only get part of the input (e.g. what if it's a Protobuf? Which is a real use case with gRPC/Flight SQL). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238921965


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   Fair. I'm also not sure why functions like vsnprintf choose to do that in the first place; maybe some POSIX convention



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238620511


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   So far things have avoided assuming libc malloc. We could return a struct with a release callback?
   
   Caller-allocated buffers get annoying because you have to somehow tell the caller the size...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1235570606


##########
adbc.h:
##########
@@ -1380,6 +1786,66 @@ AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection,
                                            struct ArrowArrayStream* out,
                                            struct AdbcError* error);
 
+/// \brief Get a view of types supported by the database.
+///
+/// The result is an Arrow schema with one field per type.  This represents
+/// the driver's mapping of the database type to an Arrow type.

Review Comment:
   I don't want us to specify the casing, that's up to the database.
   
   I'll clarify what the field name was meant to be.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] zeroshade commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.

zeroshade commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1235550134


##########
adbc.h:
##########
@@ -1095,6 +1237,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |

Review Comment:
   Given the difficulty and complexity we've seen with `GetObjects`, do we want to follow the same path here for UDFs and Stored procedures? Or do we want to try to find a less complex / less nested route for this to avoid the same problems / complaints?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238604011


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   Or you switch to malloc-allocated output, or you let the caller pass a suitable buffer area...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238917810


##########
adbc.h:
##########
@@ -1095,6 +1256,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |

Review Comment:
   I'm actually more inclined to drop these functions entirely now, given that real-world use seems to be spotty. My motivation here is to enable federation via ADBC (engines calling out to other engines) and it seems that even if you have a listing of available functions/procedures, what you actually want is something like Substrait because just the names/types aren't necessarily useful unless you also know semantics



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238605416


##########
adbc.h:
##########
@@ -1380,6 +1827,67 @@ AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection,
                                            struct ArrowArrayStream* out,
                                            struct AdbcError* error);
 
+/// \brief Get a view of types supported by the database.
+///
+/// The result is an Arrow schema with one field per type.  The field name is
+/// the database's name for the type, and the field type is the driver's
+/// mapping of the database type to an Arrow type.

Review Comment:
   Perhaps an unsatisfactory answer, but this is based on the database's idea of types, so not every Arrow type is necessarily represented. However, I'll dig up concrete examples of what happens with decimal/timestamp types in databases.
   
   (Timezone handling, though, is a mess, and I'll point out that e.g. Postgres and SQL Server just have naive timestamps + UTC timestamps, with the timezone being purely a display concept that is set on the fly)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1235564196


##########
adbc.h:
##########
@@ -811,21 +867,43 @@ struct ADBC_EXPORT AdbcDriver {
 
   AdbcStatusCode (*DatabaseGetOption)(struct AdbcDatabase*, const char*, const char**,
                                       struct AdbcError*);
+  AdbcStatusCode (*DatabaseGetOptionBytes)(struct AdbcDatabase*, const char*,
+                                           const uint8_t**, size_t*, struct AdbcError*);
   AdbcStatusCode (*DatabaseGetOptionInt)(struct AdbcDatabase*, const char*, int64_t*,
                                          struct AdbcError*);
+  AdbcStatusCode (*DatabaseSetOptionBytes)(struct AdbcDatabase*, const char*,
+                                           const uint8_t*, size_t, struct AdbcError*);
   AdbcStatusCode (*DatabaseGetOptionDouble)(struct AdbcDatabase*, const char*, double*,
                                             struct AdbcError*);
   AdbcStatusCode (*DatabaseSetOptionInt)(struct AdbcDatabase*, const char*, int64_t,
                                          struct AdbcError*);
   AdbcStatusCode (*DatabaseSetOptionDouble)(struct AdbcDatabase*, const char*, double,

Review Comment:
   Typo, yes, I'll fix the order



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm merged pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm merged PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] zeroshade commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.

zeroshade commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1235547382


##########
adbc.h:
##########
@@ -811,21 +867,43 @@ struct ADBC_EXPORT AdbcDriver {
 
   AdbcStatusCode (*DatabaseGetOption)(struct AdbcDatabase*, const char*, const char**,
                                       struct AdbcError*);
+  AdbcStatusCode (*DatabaseGetOptionBytes)(struct AdbcDatabase*, const char*,
+                                           const uint8_t**, size_t*, struct AdbcError*);
   AdbcStatusCode (*DatabaseGetOptionInt)(struct AdbcDatabase*, const char*, int64_t*,
                                          struct AdbcError*);
+  AdbcStatusCode (*DatabaseSetOptionBytes)(struct AdbcDatabase*, const char*,
+                                           const uint8_t*, size_t, struct AdbcError*);
   AdbcStatusCode (*DatabaseGetOptionDouble)(struct AdbcDatabase*, const char*, double*,
                                             struct AdbcError*);
   AdbcStatusCode (*DatabaseSetOptionInt)(struct AdbcDatabase*, const char*, int64_t,
                                          struct AdbcError*);
   AdbcStatusCode (*DatabaseSetOptionDouble)(struct AdbcDatabase*, const char*, double,

Review Comment:
   Do we want to reorder these a bit so that the Get/Set typed options are next to each other for the same types for easier reading / finding? Either that or can we put ALL the Gets followed by ALL the Sets? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#issuecomment-1607827831

   There's been no movement here so I'll update this PR with Go and Java definitions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238588760


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   Hmm... but if `AdbcDatabaseGetOptionBytes` is called from multiple threads, then it is undefined if the return `value` is still valid when the calling thread consumes it, right?
   
   (same problem as with POSIX `getenv`, actually: https://pubs.opengroup.org/onlinepubs/9699919799/functions/getenv.html)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238839287


##########
adbc.h:
##########
@@ -1380,6 +1827,67 @@ AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection,
                                            struct ArrowArrayStream* out,
                                            struct AdbcError* error);
 
+/// \brief Get a view of types supported by the database.
+///
+/// The result is an Arrow schema with one field per type.  The field name is
+/// the database's name for the type, and the field type is the driver's
+/// mapping of the database type to an Arrow type.
+///
+/// Each top-level field may have the following metadata fields:
+///
+/// - adbc:catalog_name
+/// - adbc:db_schema_name
+/// - adbc:supertype_catalog_name
+/// - adbc:supertype_db_schema_name
+/// - adbc:supertype_name

Review Comment:
   Hmm, none of MySQL, SQL Server, or PostgreSQL implement this. I'm inclined to drop this if it never saw real-world usage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238990955


##########
adbc.h:
##########
@@ -1095,6 +1258,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |
+///
+/// 1. Database-specific description of the function.
+/// 2. The kind of function.  Should be null if not known, 1 if the function

Review Comment:
   Mostly just the API reference documentation; the specification itself hasn't been helpful to me (also, it seems it's released as revisions to previous specs instead of as a standalone spec, so e.g. even though [this](https://download.oracle.com/otn-pub/jcp/jdbc-4_3-mrel3-eval-spec/jdbc4.3-fr-spec.pdf?AuthParam=1687272585_864c20ce8d42bd6fd6806595083f3975) is "JDBC 4.3" it doesn't actually describe most things)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1239005667


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   Updated that too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] zeroshade commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "zeroshade (via GitHub)" <gi...@apache.org>.

zeroshade commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1235552960


##########
adbc.h:
##########
@@ -1380,6 +1786,66 @@ AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection,
                                            struct ArrowArrayStream* out,
                                            struct AdbcError* error);
 
+/// \brief Get a view of types supported by the database.
+///
+/// The result is an Arrow schema with one field per type.  This represents
+/// the driver's mapping of the database type to an Arrow type.

Review Comment:
   Should we clarify that the name of each field should be the database specific name of the type while the field's Type would be the mapped Arrow type? Should we establish a standard as far as lower/upper casing of the names?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#issuecomment-1600863206

   Updated to include some additional common statistics after reviewing Hive. 
   
   It may also be good to include the 'histogram' statistic (Oracle, Hive, PostgreSQL) but the encoding for this gets very messy with Arrow: either you need a union[list[int], list[double], ...], or you have to devise some way to pack it into a binary column. (Or possibly we can just include list[binary] and declare that things need to be packed there.)
   
   Regardless, the statistic schema is now fairly complicated with nesting, unions, and dictionary-encoding.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238821195


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   Ok, how about something like this?
   
   ```c
   GetOptionBytes(..., uint8_t* value, size_t* length, ...)
   /// length must be provided and must be the size of the buffer pointed
   /// to by value.  If there is sufficient space, the driver will copy
   /// the option value to buffer and set length to the size of the
   /// actual value.  If the buffer is too small, no data will be written
   /// and length will be set to the required length.
   ///
   /// In other words:
   ///
   /// - If output length <= input length, value will contain a value
   ///   with length bytes.
   /// - If output length > input length, nothing has been written to
   ///   value.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238917701


##########
adbc.h:
##########
@@ -1380,6 +1827,67 @@ AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection,
                                            struct ArrowArrayStream* out,
                                            struct AdbcError* error);
 
+/// \brief Get a view of types supported by the database.
+///
+/// The result is an Arrow schema with one field per type.  The field name is
+/// the database's name for the type, and the field type is the driver's
+/// mapping of the database type to an Arrow type.

Review Comment:
   Is this expected to be the same as `AdbcConnectionGetTableTypes` except for having the arrow mapping? Is there a reason to have both?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238921667


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   Frankly, I'm most inclined to revert to the original definition; multiple threads calling GetOption and invalidating the pointer is the same as a single thread calling GetOption and invalidating a previous pointer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238634043


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   > Caller-allocated buffers get annoying because you have to somehow tell the caller the size...
   
   But at least the lifetime is safe. 



##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   > Caller-allocated buffers get annoying because you have to somehow tell the caller the size...
   
   But at least the lifetime is safe and it makes things simpler for the producer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238887198


##########
adbc.h:
##########
@@ -1095,6 +1237,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |

Review Comment:
   That's my idea with the code generator, to be able to just generate the code for helpers along those lines to build/access these structures and check that the schema is the expected schema



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#issuecomment-1608003187

   Ok, Java and Go definitions are now present.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238840594


##########
adbc.h:
##########
@@ -1362,6 +1743,72 @@ AdbcStatusCode AdbcConnectionGetTableSchema(struct AdbcConnection* connection,
                                             struct ArrowSchema* schema,
                                             struct AdbcError* error);
 
+/// \brief Get statistics about the data distribution of table(s).
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | catalog_name             | utf8                             |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>           |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | db_schema_name           | utf8                             |
+/// | db_schema_functions      | list<STATISTICS_SCHEMA>          |
+///
+/// STATISTICS_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       | Comments |
+/// |--------------------------|----------------------------------| -------- |
+/// | table_name               | utf8 not null                    |          |
+/// | column_name              | utf8                             | (1)      |
+/// | statistic_name           | dictionary<int16, utf8> not null | (2)      |
+/// | statistic_value          | VALUE_SCHEMA not null            |          |
+/// | statistic_is_approximate | bool not null                    | (3)      |
+///
+/// 1. If null, then the statistic applies to the entire table.
+/// 2. A dictionary-encoded statistic name. Values in [0, 1024) are
+///    reserved for ADBC use.  Other values are free for
+///    implementation-specific statistics.  For the names and
+///    definitions of predefined statistic types, see \ref
+///    adbc-table-statistics.

Review Comment:
   Switched to just `int16` with a separate call to get driver-specific statistic values.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238901992


##########
adbc.h:
##########
@@ -1095,6 +1258,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |
+///
+/// 1. Database-specific description of the function.
+/// 2. The kind of function.  Should be null if not known, 1 if the function

Review Comment:
   Postgres has aggregate, window, procedure and standard functions. Not sure how all databases manage this but seems like we might want to have finer granularity than just the 1/2 proposed here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238988560


##########
adbc.h:
##########
@@ -1095,6 +1258,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |
+///
+/// 1. Database-specific description of the function.
+/// 2. The kind of function.  Should be null if not known, 1 if the function

Review Comment:
   Out of curiosity where do you find the JDBC specification details? Having trouble finding a definitive authority on that via search



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238973860


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   Updated, I suppose we should also update the base `GetOption` (returns char*) though



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238596964


##########
adbc.h:
##########
@@ -1362,6 +1743,72 @@ AdbcStatusCode AdbcConnectionGetTableSchema(struct AdbcConnection* connection,
                                             struct ArrowSchema* schema,
                                             struct AdbcError* error);
 
+/// \brief Get statistics about the data distribution of table(s).
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | catalog_name             | utf8                             |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>           |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | db_schema_name           | utf8                             |
+/// | db_schema_functions      | list<STATISTICS_SCHEMA>          |
+///
+/// STATISTICS_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       | Comments |
+/// |--------------------------|----------------------------------| -------- |
+/// | table_name               | utf8 not null                    |          |
+/// | column_name              | utf8                             | (1)      |
+/// | statistic_name           | dictionary<int16, utf8> not null | (2)      |
+/// | statistic_value          | VALUE_SCHEMA not null            |          |
+/// | statistic_is_approximate | bool not null                    | (3)      |
+///
+/// 1. If null, then the statistic applies to the entire table.
+/// 2. A dictionary-encoded statistic name. Values in [0, 1024) are
+///    reserved for ADBC use.  Other values are free for
+///    implementation-specific statistics.  For the names and
+///    definitions of predefined statistic types, see \ref
+///    adbc-table-statistics.

Review Comment:
   Ah...right, dictionaries aren't sparse. I will probably abandon this idea then and just have it contain strings, and take the potential space hit (likely won't be a big deal).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238892112


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   Any reason to not follow the pattern of functions like vsnprintf? i.e. print what you can but ultimately left to the caller to determine if the size was sufficient



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238593601


##########
adbc.h:
##########
@@ -1095,6 +1256,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |

Review Comment:
   Does `function_columns` describe the function arguments?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238575003


##########
adbc.h:
##########
@@ -1362,6 +1743,72 @@ AdbcStatusCode AdbcConnectionGetTableSchema(struct AdbcConnection* connection,
                                             struct ArrowSchema* schema,
                                             struct AdbcError* error);
 
+/// \brief Get statistics about the data distribution of table(s).
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | catalog_name             | utf8                             |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>           |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | db_schema_name           | utf8                             |
+/// | db_schema_functions      | list<STATISTICS_SCHEMA>          |
+///
+/// STATISTICS_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       | Comments |
+/// |--------------------------|----------------------------------| -------- |
+/// | table_name               | utf8 not null                    |          |
+/// | column_name              | utf8                             | (1)      |
+/// | statistic_name           | dictionary<int16, utf8> not null | (2)      |
+/// | statistic_value          | VALUE_SCHEMA not null            |          |
+/// | statistic_is_approximate | bool not null                    | (3)      |
+///
+/// 1. If null, then the statistic applies to the entire table.
+/// 2. A dictionary-encoded statistic name. Values in [0, 1024) are
+///    reserved for ADBC use.  Other values are free for
+///    implementation-specific statistics.  For the names and
+///    definitions of predefined statistic types, see \ref
+///    adbc-table-statistics.

Review Comment:
   This is a bit weird. Does it mean the dictionary _must_ have more than 1024 entries if impl-specific stats are desired?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238578210


##########
adbc.h:
##########
@@ -386,6 +413,54 @@ struct ADBC_EXPORT AdbcError {
 /// \see AdbcConnectionGetObjects
 #define ADBC_OBJECT_DEPTH_COLUMNS ADBC_OBJECT_DEPTH_ALL
 
+/// \defgroup adbc-table-statistics ADBC Statistic Types
+/// Standard statistic names for AdbcConnectionGetTableStatistics.
+/// @{
+
+/// \brief The dictionary-encoded name of the average byte width statistic.
+#define ADBC_STATISTIC_AVERAGE_BYTE_WIDTH_KEY 0
+/// \brief The average byte width statistic.  The average size in bytes of a
+///   row in the column.  Value type is float64.
+///
+/// For example, this is roughly the average length of a string for a string
+/// column.
+#define ADBC_STATISTIC_AVERAGE_BYTE_WIDTH_NAME "adbc.statistic.byte_width"
+/// \brief The dictionary-encoded name of the distinct value count statistic.
+#define ADBC_STATISTIC_DISTINCT_COUNT_KEY 1
+/// \brief The distinct value count (NDV) statistic.  The number of distinct
+///   values in the column.  Value type is uint64 (when not approximate) or

Review Comment:
   AFAIU, Java doesn't really like unsigned integers, especially 64-bit. Would it help to switch to int64?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238918479


##########
adbc.h:
##########
@@ -1095,6 +1258,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |
+///
+/// 1. Database-specific description of the function.
+/// 2. The kind of function.  Should be null if not known, 1 if the function

Review Comment:
   Yes, this tried to adhere fairly closely to JDBC and so it's not very expressive. But I think so far it's mostly folly to try to unify these different definitions until we see some more concrete use cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238914019


##########
adbc.h:
##########
@@ -1095,6 +1258,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |
+///
+/// 1. Database-specific description of the function.
+/// 2. The kind of function.  Should be null if not known, 1 if the function

Review Comment:
   Also worth noting that while the proposal here has separate interfaces for functions versus procedures, postgres considers both to be a type of routine. Client tools like psql will list both via the `\df` shortcut



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#issuecomment-1599108142

   @zeroshade this should be the rest of the requested APIs for 1.1.0, if this looks reasonable I'll port to Java and Go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1235620579


##########
adbc.h:
##########
@@ -1095,6 +1237,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |

Review Comment:
   That said you could say GetTableStatistics doesn't follow it; I can adjust that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1235571112


##########
adbc.h:
##########
@@ -1095,6 +1237,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |

Review Comment:
   I've opted to keep it for consistency. I think this being difficult is a reflection of the Arrow APIs still being immature, since all things considered this is a fairly straightforward nested schema.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] WillAyd commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "WillAyd (via GitHub)" <gi...@apache.org>.

WillAyd commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238910237


##########
adbc.h:
##########
@@ -1095,6 +1256,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |

Review Comment:
   Not sure about JDBC but I do also think this would be nice to separate; when you consider all of the different function types that may or may not return something I think it is easier to manage when not all bundled into one field



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238597452


##########
adbc.h:
##########
@@ -386,6 +413,54 @@ struct ADBC_EXPORT AdbcError {
 /// \see AdbcConnectionGetObjects
 #define ADBC_OBJECT_DEPTH_COLUMNS ADBC_OBJECT_DEPTH_ALL
 
+/// \defgroup adbc-table-statistics ADBC Statistic Types
+/// Standard statistic names for AdbcConnectionGetTableStatistics.
+/// @{
+
+/// \brief The dictionary-encoded name of the average byte width statistic.
+#define ADBC_STATISTIC_AVERAGE_BYTE_WIDTH_KEY 0
+/// \brief The average byte width statistic.  The average size in bytes of a
+///   row in the column.  Value type is float64.
+///
+/// For example, this is roughly the average length of a string for a string
+/// column.
+#define ADBC_STATISTIC_AVERAGE_BYTE_WIDTH_NAME "adbc.statistic.byte_width"
+/// \brief The dictionary-encoded name of the distinct value count statistic.
+#define ADBC_STATISTIC_DISTINCT_COUNT_KEY 1
+/// \brief The distinct value count (NDV) statistic.  The number of distinct
+///   values in the column.  Value type is uint64 (when not approximate) or

Review Comment:
   I figured negative values aren't useful here, but that's fair - int64 should have more than enough range.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238598132


##########
adbc.h:
##########
@@ -1380,6 +1827,67 @@ AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection,
                                            struct ArrowArrayStream* out,
                                            struct AdbcError* error);
 
+/// \brief Get a view of types supported by the database.
+///
+/// The result is an Arrow schema with one field per type.  The field name is
+/// the database's name for the type, and the field type is the driver's
+/// mapping of the database type to an Arrow type.
+///
+/// Each top-level field may have the following metadata fields:
+///
+/// - adbc:catalog_name
+/// - adbc:db_schema_name
+/// - adbc:supertype_catalog_name
+/// - adbc:supertype_db_schema_name
+/// - adbc:supertype_name

Review Comment:
   What is the supertype supposed to denote?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238597540


##########
adbc.h:
##########
@@ -1380,6 +1827,67 @@ AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection,
                                            struct ArrowArrayStream* out,
                                            struct AdbcError* error);
 
+/// \brief Get a view of types supported by the database.
+///
+/// The result is an Arrow schema with one field per type.  The field name is
+/// the database's name for the type, and the field type is the driver's
+/// mapping of the database type to an Arrow type.

Review Comment:
   How does it work for e.g. timestamps or fixed-size strings? Should the database return all supported timezones? One of them? Same question for decimals.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238958652


##########
adbc.h:
##########
@@ -1380,6 +1827,67 @@ AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection,
                                            struct ArrowArrayStream* out,
                                            struct AdbcError* error);
 
+/// \brief Get a view of types supported by the database.
+///
+/// The result is an Arrow schema with one field per type.  The field name is
+/// the database's name for the type, and the field type is the driver's
+/// mapping of the database type to an Arrow type.

Review Comment:
   Ah, this is about data types, not table types. That said: I'm inclined to drop this too (practically, you only care about the types in the context of a table, in which case GetTableSchema should be sufficient)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#issuecomment-1609943988

   I'm going to merge this for now so I can start drafting #319; we can come back and revisit the Go/Java APIs if needed (I know zeroshade is out for a bit so I don't want this held up)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238504656


##########
adbc.h:
##########
@@ -1095,6 +1237,112 @@ ADBC_EXPORT
 AdbcStatusCode AdbcConnectionRelease(struct AdbcConnection* connection,
                                      struct AdbcError* error);
 
+/// \brief Cancel the in-progress operation on a connection.
+///
+/// This can be called during AdbcConnectionGetObjects (or similar),
+/// or while consuming an ArrowArrayStream returned from such.
+/// Calling this function should make the other functions return
+/// ADBC_STATUS_CANCELLED (from ADBC functions) or ECANCELED (from
+/// methods of ArrowArrayStream).
+///
+/// This must always be thread-safe (other operations are not).
+///
+/// \since ADBC API revision 1.1.0
+/// \addtogroup adbc-1.1.0
+///
+/// \param[in] connection The connection to cancel.
+/// \param[out] error An optional location to return an error
+///   message if necessary.
+///
+/// \return ADBC_STATUS_INVALID_STATE if there is no operation to cancel.
+/// \return ADBC_STATUS_UNKNOWN if the operation could not be cancelled.
+ADBC_EXPORT
+AdbcStatusCode AdbcConnectionCancel(struct AdbcConnection* connection,
+                                    struct AdbcError* error);
+
+/// \brief Get a hierarchical view of system and user-defined functions.
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | catalog_name             | utf8                    |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>  |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              |
+/// |--------------------------|-------------------------|
+/// | db_schema_name           | utf8                    |
+/// | db_schema_functions      | list<FUNCTION_SCHEMA>   |
+///
+/// FUNCTION_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type              | Comments |
+/// |--------------------------|-------------------------|----------|
+/// | function_name            | utf8 not null           |          |
+/// | remarks                  | utf8                    | (1)      |
+/// | function_type            | int16                   | (2)      |
+/// | specific_name            | utf8                    | (3)      |
+/// | function_columns         | list<COLUMN_SCHEMA>     |          |

Review Comment:
   > Given the difficulty and complexity we've seen with `GetObjects`, do we want to follow the same path here for UDFs and Stored procedures? Or do we want to try to find a less complex / less nested route for this to avoid the same problems / complaints?
   
   Can you elaborate on the difficulty? Is it about easily accessing nested data from C++? If so, perhaps some separate helpers could be added for ease of navigating struct arrays and scalars...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238598784


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   Gah. Possibly "thread-safe with respect to *other* functions"? 
   
   The only reason why I wanted it to be thread-safe was to allow for a progress indicator (on GetDouble). Maybe we can split that out into a dedicated function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238602761


##########
adbc.h:
##########
@@ -1362,6 +1743,72 @@ AdbcStatusCode AdbcConnectionGetTableSchema(struct AdbcConnection* connection,
                                             struct ArrowSchema* schema,
                                             struct AdbcError* error);
 
+/// \brief Get statistics about the data distribution of table(s).
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | catalog_name             | utf8                             |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>           |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | db_schema_name           | utf8                             |
+/// | db_schema_functions      | list<STATISTICS_SCHEMA>          |
+///
+/// STATISTICS_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       | Comments |
+/// |--------------------------|----------------------------------| -------- |
+/// | table_name               | utf8 not null                    |          |
+/// | column_name              | utf8                             | (1)      |
+/// | statistic_name           | dictionary<int16, utf8> not null | (2)      |
+/// | statistic_value          | VALUE_SCHEMA not null            |          |
+/// | statistic_is_approximate | bool not null                    | (3)      |
+///
+/// 1. If null, then the statistic applies to the entire table.
+/// 2. A dictionary-encoded statistic name. Values in [0, 1024) are
+///    reserved for ADBC use.  Other values are free for
+///    implementation-specific statistics.  For the names and
+///    definitions of predefined statistic types, see \ref
+///    adbc-table-statistics.

Review Comment:
   Well, I think you want to make lookups easy as well. As is "I want the min and max of column a.b.c".



##########
adbc.h:
##########
@@ -1362,6 +1743,72 @@ AdbcStatusCode AdbcConnectionGetTableSchema(struct AdbcConnection* connection,
                                             struct ArrowSchema* schema,
                                             struct AdbcError* error);
 
+/// \brief Get statistics about the data distribution of table(s).
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | catalog_name             | utf8                             |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>           |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | db_schema_name           | utf8                             |
+/// | db_schema_functions      | list<STATISTICS_SCHEMA>          |
+///
+/// STATISTICS_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       | Comments |
+/// |--------------------------|----------------------------------| -------- |
+/// | table_name               | utf8 not null                    |          |
+/// | column_name              | utf8                             | (1)      |
+/// | statistic_name           | dictionary<int16, utf8> not null | (2)      |
+/// | statistic_value          | VALUE_SCHEMA not null            |          |
+/// | statistic_is_approximate | bool not null                    | (3)      |
+///
+/// 1. If null, then the statistic applies to the entire table.
+/// 2. A dictionary-encoded statistic name. Values in [0, 1024) are
+///    reserved for ADBC use.  Other values are free for
+///    implementation-specific statistics.  For the names and
+///    definitions of predefined statistic types, see \ref
+///    adbc-table-statistics.

Review Comment:
   Well, I think you want to make lookups easy (and reasonably efficient?) as well. As is "I want the min and max of column a.b.c".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238603545


##########
adbc.h:
##########
@@ -1380,6 +1827,67 @@ AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection,
                                            struct ArrowArrayStream* out,
                                            struct AdbcError* error);
 
+/// \brief Get a view of types supported by the database.
+///
+/// The result is an Arrow schema with one field per type.  The field name is
+/// the database's name for the type, and the field type is the driver's
+/// mapping of the database type to an Arrow type.
+///
+/// Each top-level field may have the following metadata fields:
+///
+/// - adbc:catalog_name
+/// - adbc:db_schema_name
+/// - adbc:supertype_catalog_name
+/// - adbc:supertype_db_schema_name
+/// - adbc:supertype_name

Review Comment:
   This is taken from JDBC, but I still need to find a concrete example of how this works. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238623148


##########
adbc.h:
##########
@@ -1362,6 +1743,72 @@ AdbcStatusCode AdbcConnectionGetTableSchema(struct AdbcConnection* connection,
                                             struct ArrowSchema* schema,
                                             struct AdbcError* error);
 
+/// \brief Get statistics about the data distribution of table(s).
+///
+/// The result is an Arrow dataset with the following schema:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | catalog_name             | utf8                             |
+/// | catalog_db_schemas       | list<DB_SCHEMA_SCHEMA>           |
+///
+/// DB_SCHEMA_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       |
+/// |--------------------------|----------------------------------|
+/// | db_schema_name           | utf8                             |
+/// | db_schema_functions      | list<STATISTICS_SCHEMA>          |
+///
+/// STATISTICS_SCHEMA is a Struct with fields:
+///
+/// | Field Name               | Field Type                       | Comments |
+/// |--------------------------|----------------------------------| -------- |
+/// | table_name               | utf8 not null                    |          |
+/// | column_name              | utf8                             | (1)      |
+/// | statistic_name           | dictionary<int16, utf8> not null | (2)      |
+/// | statistic_value          | VALUE_SCHEMA not null            |          |
+/// | statistic_is_approximate | bool not null                    | (3)      |
+///
+/// 1. If null, then the statistic applies to the entire table.
+/// 2. A dictionary-encoded statistic name. Values in [0, 1024) are
+///    reserved for ADBC use.  Other values are free for
+///    implementation-specific statistics.  For the names and
+///    definitions of predefined statistic types, see \ref
+///    adbc-table-statistics.

Review Comment:
   We could just make it an int column with a separate call to get the "dictionary" (without making it a formal Arrow dictionary). Non-standardized statistic names require special handling by callers anyways.
   
   We could also make the schema wider; this is often the actual implementation underneath. But that makes it less extensible (unless we have a generic `list[struct[string, union[...]]]` column at the end)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238835547


##########
adbc.h:
##########
@@ -1380,6 +1827,67 @@ AdbcStatusCode AdbcConnectionGetTableTypes(struct AdbcConnection* connection,
                                            struct ArrowArrayStream* out,
                                            struct AdbcError* error);
 
+/// \brief Get a view of types supported by the database.
+///
+/// The result is an Arrow schema with one field per type.  The field name is
+/// the database's name for the type, and the field type is the driver's
+/// mapping of the database type to an Arrow type.

Review Comment:
   So for PostgreSQL, only one timestamptz and one numeric type are listed, even though both types are technically parametrized. The driver, implementing this call, would also just return `timestamptz = timestamp[us, UTC]` and `numeric = decimal256(76, 0)` (the PostgreSQL `numeric` type would actually map to...something complicated, but ignore that).
   
   Composite types (structs) have to be [spelled out](https://www.postgresql.org/docs/current/rowtypes.html), so each one defined in the database would get its own entry here, and the driver should be able to reconstruct the child fields.
   
   There is a record type, which wouldn't map cleanly onto anything:
   
   ```
   postgres=# SELECT ROW(1, 2);
     row  
   -------
    (1,2)
   (1 row)
   
   postgres=# SELECT pg_typeof(ROW(1, 2));
    pg_typeof 
   -----------
    record
   (1 row)
   
   postgres=# SELECT ROW(1, 2) UNION ALL (SELECT ROW(1));
     row  
   -------
    (1,2)
    (1)
   (2 rows)
   ```
   
   I suppose we could stipulate that if there is no fixed type mapping for a type, the driver can just ignore it in this call.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #765: feat(format): add additional features to 1.1.0 spec

Posted by "pitrou (via GitHub)" <gi...@apache.org>.

pitrou commented on code in PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#discussion_r1238950291


##########
adbc.h:
##########
@@ -907,6 +1008,34 @@ AdbcStatusCode AdbcDatabaseNew(struct AdbcDatabase* database, struct AdbcError*
 AdbcStatusCode AdbcDatabaseGetOption(struct AdbcDatabase* database, const char* key,
                                      const char** value, struct AdbcError* error);
 
+/// \brief Get a bytestring option of the database.
+///
+/// This must always be thread-safe (other operations are not).

Review Comment:
   > Ok, how about something like this?
   
   LGTM



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] github-actions[bot] commented on pull request #765: format: add additional features to 1.1.0 spec

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #765:
URL: https://github.com/apache/arrow-adbc/pull/765#issuecomment-1587767382

   :warning: Please follow the [Conventional Commits format in CONTRIBUTING.md](https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md) for PR titles.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org