You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "kou (via GitHub)" <gi...@apache.org> on 2023/04/09 21:37:16 UTC

[GitHub] [arrow-adbc] kou opened a new issue, #581: c/driver/sqlite: Column type is always int64 with empty table

kou opened a new issue, #581:
URL: https://github.com/apache/arrow-adbc/issues/581

   With the following SQL, the SQLite driver returns `number:int64, string:int64` schema not `number:int64, string:string` schema:
   
   ```sql
   CREATE TABLE data (number int, string text);
   SELECT * FROM data;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on issue #581: c/driver/sqlite: Column type is always int64 with empty table

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on issue #581:
URL: https://github.com/apache/arrow-adbc/issues/581#issuecomment-1503147838

   Not really. There's also not a great way to pass in a schema currently.
   
   We don't need a SQL parser. It's just that SQLite is rather fuzzy about what type names it accepts. The rules are simple, but surprising.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on issue #581: c/driver/sqlite: Column type is always int64 with empty table

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on issue #581:
URL: https://github.com/apache/arrow-adbc/issues/581#issuecomment-1501381444

   Yes, decltype would work. I think the hard part there will be that SQLite doesn't really check the type name so we would have to replicate 3.1.1 here: https://www.sqlite.org/datatype3.html
   
   Also it looks that decltype will fail if it's something like `x + 1` so in that case the user might still get a type that is a little unexpected. 
   
   On Mon, Apr 10, 2023, at 10:10, Sutou Kouhei wrote:
   > 
   > 
   > Ah, I thought that `sqlite3_column_type()` can work with an empty table. So I thought that we can improve the current implementation by:
   > 
   > diff --git a/c/driver/sqlite/statement_reader.c b/c/driver/sqlite/statement_reader.c
   > index 694b8cc..21fb20f 100644
   > --- a/c/driver/sqlite/statement_reader.c
   > +++ b/c/driver/sqlite/statement_reader.c
   > @@ -495,7 +495,7 @@ void StatementReaderRelease(struct ArrowArrayStream* self) {
   >  
   >  /// Initialize buffers for the first (type-inferred) batch of data.
   >  /// Use raw buffers since the types may change.
   > -AdbcStatusCode StatementReaderInitializeInfer(int num_columns, size_t infer_rows,
   > +AdbcStatusCode StatementReaderInitializeInfer(sqlite3_stmt *stmt, int num_columns, size_t infer_rows,
   >                                                struct ArrowBitmap* validity,
   >                                                struct ArrowBuffer* data,
   >                                                struct ArrowBuffer* binary,
   > @@ -507,7 +507,21 @@ AdbcStatusCode StatementReaderInitializeInfer(int num_columns, size_t infer_rows
   >      ArrowBufferInit(&data[i]);
   >      CHECK_NA(INTERNAL, ArrowBufferReserve(&data[i], infer_rows * sizeof(int64_t)), error);
   >      memset(&binary[i], 0, sizeof(struct ArrowBuffer));
   > -    current_type[i] = NANOARROW_TYPE_INT64;
   > +    int sqlite_type = sqlite3_column_type(stmt, i);
   > +    switch (sqlite_type) {
   > +    case SQLITE_FLOAT:
   > +      current_type[i] = NANOARROW_TYPE_DOUBLE;
   > +      break;
   > +    case SQLITE_TEXT:
   > +      current_type[i] = NANOARROW_TYPE_STRING;
   > +      break;
   > +    case SQLITE_BLOB:
   > +      current_type[i] = NANOARROW_TYPE_BINARY;
   > +      break;
   > +    default:
   > +      current_type[i] = NANOARROW_TYPE_INT64;
   > +      break;
   > +    }
   >    }
   >    return ADBC_STATUS_OK;
   >  }  // NOLINT(whitespace/indent)
   > @@ -835,7 +849,7 @@ AdbcStatusCode AdbcSqliteExportReader(sqlite3* db, sqlite3_stmt* stmt,
   >    enum ArrowType* current_type = malloc(num_columns * sizeof(enum ArrowType));
   >  
   >    AdbcStatusCode status = StatementReaderInitializeInfer(
   > -      num_columns, batch_size, validity, data, binary, current_type, error);
   > +      stmt, num_columns, batch_size, validity, data, binary, current_type, error);
   >    if (status == ADBC_STATUS_OK) {
   >      int64_t num_rows = 0;
   >      while (num_rows < batch_size) {
   > But `sqlite3_column_type()` always returns `SQLITE_NULL` with the case. :<
   > 
   > We may want to use `sqlite3_column_decltype()`...
   > 
   > 
   > 
   > —
   > Reply to this email directly, view it on GitHub <https://github.com/apache/arrow-adbc/issues/581#issuecomment-1501267541>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACQB36Y4GKADDJHBM6LKXLXANMXJANCNFSM6AAAAAAWYK5JDU>.
   > You are receiving this because you commented.Message ID: ***@***.***>
   > 
   > 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on issue #581: c/driver/sqlite: Column type is always int64 with empty table

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on issue #581:
URL: https://github.com/apache/arrow-adbc/issues/581#issuecomment-1503148935

   Or really: SQLite is untyped, despite whatever it lets you say. We can sort of patch over cases like this that look like they should work, but fundamentally it's just a fiction that we're just trying to construct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] lidavidm commented on issue #581: c/driver/sqlite: Column type is always int64 with empty table

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.

lidavidm commented on issue #581:
URL: https://github.com/apache/arrow-adbc/issues/581#issuecomment-1501257240

   It's kind of a consequence of it inferring/guessing column types instead of trying to leverage metadata (since SQLite doesn't enforce types). We could have it use specified types as a starting point/as a restriction (erroring if data differs).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] kou commented on issue #581: c/driver/sqlite: Column type is always int64 with empty table

Posted by "kou (via GitHub)" <gi...@apache.org>.

kou commented on issue #581:
URL: https://github.com/apache/arrow-adbc/issues/581#issuecomment-1501267541

   Ah, I thought that `sqlite3_column_type()` can work with an empty table. So I thought that we can improve the current implementation by:
   
   ```diff
   diff --git a/c/driver/sqlite/statement_reader.c b/c/driver/sqlite/statement_reader.c
   index 694b8cc..21fb20f 100644
   --- a/c/driver/sqlite/statement_reader.c
   +++ b/c/driver/sqlite/statement_reader.c
   @@ -495,7 +495,7 @@ void StatementReaderRelease(struct ArrowArrayStream* self) {
    
    /// Initialize buffers for the first (type-inferred) batch of data.
    /// Use raw buffers since the types may change.
   -AdbcStatusCode StatementReaderInitializeInfer(int num_columns, size_t infer_rows,
   +AdbcStatusCode StatementReaderInitializeInfer(sqlite3_stmt *stmt, int num_columns, size_t infer_rows,
                                                  struct ArrowBitmap* validity,
                                                  struct ArrowBuffer* data,
                                                  struct ArrowBuffer* binary,
   @@ -507,7 +507,21 @@ AdbcStatusCode StatementReaderInitializeInfer(int num_columns, size_t infer_rows
        ArrowBufferInit(&data[i]);
        CHECK_NA(INTERNAL, ArrowBufferReserve(&data[i], infer_rows * sizeof(int64_t)), error);
        memset(&binary[i], 0, sizeof(struct ArrowBuffer));
   -    current_type[i] = NANOARROW_TYPE_INT64;
   +    int sqlite_type = sqlite3_column_type(stmt, i);
   +    switch (sqlite_type) {
   +    case SQLITE_FLOAT:
   +      current_type[i] = NANOARROW_TYPE_DOUBLE;
   +      break;
   +    case SQLITE_TEXT:
   +      current_type[i] = NANOARROW_TYPE_STRING;
   +      break;
   +    case SQLITE_BLOB:
   +      current_type[i] = NANOARROW_TYPE_BINARY;
   +      break;
   +    default:
   +      current_type[i] = NANOARROW_TYPE_INT64;
   +      break;
   +    }
      }
      return ADBC_STATUS_OK;
    }  // NOLINT(whitespace/indent)
   @@ -835,7 +849,7 @@ AdbcStatusCode AdbcSqliteExportReader(sqlite3* db, sqlite3_stmt* stmt,
      enum ArrowType* current_type = malloc(num_columns * sizeof(enum ArrowType));
    
      AdbcStatusCode status = StatementReaderInitializeInfer(
   -      num_columns, batch_size, validity, data, binary, current_type, error);
   +      stmt, num_columns, batch_size, validity, data, binary, current_type, error);
      if (status == ADBC_STATUS_OK) {
        int64_t num_rows = 0;
        while (num_rows < batch_size) {
   ```
   
   But `sqlite3_column_type()` always returns `SQLITE_NULL` with the case. :<
   
   We may want to use `sqlite3_column_decltype()`...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-adbc] paleolimbot commented on issue #581: c/driver/sqlite: Column type is always int64 with empty table

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.

paleolimbot commented on issue #581:
URL: https://github.com/apache/arrow-adbc/issues/581#issuecomment-1502289180

   Is there a mechanism by which a user can bind an output schema? (That would let a user get a consistent output schema if that was important for an application...short of replicating an entire SQL parser, it might be the only way to ensure that in this case?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org