You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "melgenek (via GitHub)" <gi...@apache.org> on 2023/05/14 13:44:19 UTC

[GitHub] [arrow-datafusion] melgenek commented on issue #6349: Sqllogictests doesn't cover cases if the column name is not expected.

melgenek commented on issue #6349:
URL: https://github.com/apache/arrow-datafusion/issues/6349#issuecomment-1546903943

   As far as I can tell, sqllogictest in general, and `sqllogictest-rs` do not support column name checks right now.
   
   There are some ways to extend `sqllogictest-rs` to support columns. It seems that to add native support for column names one would need to do the following:
   1) add a `colnames` column check support to the `query` clause to optionally switch the check on/off, where off is the default. Column names could be the first row in the text representation
   ```
   query II rowsort colnames
   select 1 as one, 2 as two;
   ----
   one two
   1     2
   ```
   2) update [DBOutput](https://github.com/risinglightdb/sqllogictest-rs/blob/27eb9f50993e10b36c1f4f68ad3afe499adbbb49/sqllogictest/src/runner.rs#L37) and [RecordOutput](https://github.com/risinglightdb/sqllogictest-rs/blob/27eb9f50993e10b36c1f4f68ad3afe499adbbb49/sqllogictest/src/runner.rs#L23) to have a field `column_names`.
   3) update implementations of the `sqllogictest::AsyncDB` to return column names along with types and results
   4) introduce a validator for column names similar [to the type and result validator](https://github.com/risinglightdb/sqllogictest-rs/blob/27eb9f50993e10b36c1f4f68ad3afe499adbbb49/sqllogictest/src/runner.rs#L416-L450). A default implementation would probably be a no-op implementation to prevent behavior changes for library users other than Datafusion. A real implementation would likely just lowecase names and compare strings.
   
   -----------
   On the other hand, one could make comparisons work on the Datafusion side.
   
   It seems that the `EXPLAIN` statement already gives the alias names for projections, so it is possible to run the same query twice: once with EXPLAIN, once without. This way both names and values are checked. Of course, there is a lot more information than just names in an EXPLAIN output.
   
   Another way is to create a more powerful version of an `arrow_typeof`. For example, Databricks has a [DESCRIBE QUERY](https://spark.apache.org/docs/3.0.0-preview/sql-ref-syntax-aux-describe-query.html) and Snowflake a [DESCRIBE RESULT](https://docs.snowflake.com/en/sql-reference/sql/desc-result) that show the expected output metadata in format similar to
   ```
     +---------+------------+
     |col_name |data_type   |
     +---------+------------+
     |one      | int        |
     |two      | bigint     |
     +---------+------------+
   ```
   This way both specific Arrow types and names could be checked with one query.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org