You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/01/20 12:41:14 UTC

[GitHub] [superset] villebro edited a comment on issue #18085: Snowflake: Inconsistent column name case

villebro edited a comment on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017459122


   Ok, so this all stems from the design decision Oracle-like databases have made, which I generally call "lowercase means uppercase means caseless", which IMO is a terrible assumption (I _really_ wish Snowflake hadn't gone with this design). I may be off on some details, but basically it boils down to this: When you write
   
   ```sql
   select column
   from table
   ```
   
   and the column name is caseless (=stored in metadata as UPPERCASE), the resulting column name in the query result is `COLUMN`. You get the same result by writing (correct me if I misremember these)
   
   ```sql
   select Column
   from table
   ```
   
   or
   
   ```sql
   select COLUMN
   from table
   ```
   
   or even
   
   ```sql
   select "COLUMN"
   from table
   ```
   
   but an error when writing
   
   ```sql
   select "column"
   from table
   ```
   
   because the column name isn't defined case-sensitively as "column", but rather stored in the metadata as "COLUMN" (=caseless). However, when you check the column name using `inspector.get_columns`, the column shows up as `column`, i.e. lowercase, which kind of isn't really true using Oracle logic, but ends up working in Superset. The Snowflake dialect does these to/from conversions here: https://github.com/snowflakedb/snowflake-sqlalchemy/blob/9118cf8f18a0039f9cb5d3892ff2b1e5c82a05e0/snowdialect.py#L217-L234 , and these functions kind of make sense when you know the assumptions behind case handling, but is super confusing when you're trying to write code that handles the cases correctly.
   
   Now, we actually have a method in `BaseEngineSpec` called `get_column_spec` which takes the source of the column definition as one argument: https://github.com/apache/superset/blob/035638c95869d0d4b7c9d44f13e88e3535ecdf4c/superset/db_engine_specs/base.py#L1313-L1320
   What we could do here is add `column_name` as an argument and add a field `normalized_name` to the `ColumnSpec` type and return the normalized column name on affected dbs (Oracle, Snowflake et al) if the column metadata originates from a query (=virtual table), but leave it unchanged for physical tables. This would just need to be implemented in the SQLAlchemy model and shouldn't be lots of work, but it would be really important to do super thorough testing to make sure it handles all cases correctly (caseless, UPPERCASE, "lowercase" and "Mixed Case").
   
   This probably sounds really confusing, so maybe it makes sense to have a kickoff call to get this work started to make sure we fully understand the problem and have a clear proposal for solving this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org