You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/01/19 05:56:57 UTC

[GitHub] [superset] rumbin opened a new issue #18085: Snowflake: Inconsistent column name case

rumbin opened a new issue #18085:
URL: https://github.com/apache/superset/issues/18085


   Superset handles the case of non-case-sensitive column names inconsistently for Snowflake connections:
   
   * Native tables/views, i.e., physical datasets, have _lowercase_ column names, as it is internally handled by SQLAlchemy.
   * SQLLab's virtual datasets have _UPPERCASE_ column names.
   
   This is troublesome for all dashboards where filters are present which act on charts with related, mixed physical/virtual datasets.  
   Superset's filter scoping is case sensitive. Thus, filters on a certain column name will _either_ be applied to the related physical _or_ virtual datasets.
   
   #### How to reproduce the bug
   
   1. Register any Snowflake table with non-case-sensitive column names (i.e., UPPERCASE) as a physical dataset in Superset.
   2. Notice how its columns are recognized and stored in lowercase.
   3. Edit the dataset again and convert it to a _virtual_ dataset. Use `select * from …` as statement text. Save the changes.
   4. Edit the dataset again and synchronize the column names.
   5. Notice how the column names now changed to UPPERCASE. 
   
   ### Expected results
   
   The case should not depend on whether the dataset is physical or virtual.  
   All non-case-sensitive (i.e., UPPERCASE in Snowflake) should be converted to lowercase consistently, also for virtual datasets created in SQLLab.
   
   Lowercase is the internal representation of SQLALchemy.
   
   ### Actual results
   Virtual dataset's column names are treated as UPPERCASE.
   
   #### Screenshots
   The easies way to experience this effect is to simply explore a Snowflake table in SQLLab.
   As you can see in the following screenshot, the column names that are extracted from the schema and displayed in the schema browser on the left are represented as _lowercase_, while the query results of the preview table have _UPPERCASE_ column names:
   
   ![image](https://user-images.githubusercontent.com/1220356/150071559-d51466fe-e181-4148-8628-a6b4d4b17f43.png)
   
   
   ### Rejected workarounds
   An option would of course be to use double-quoted lowercase column aliases in the `SELECT` statement of the virtual dataset.
   This would make these column names case-sensitive and thus be treated as lowercase.
   However, with a broad userbase working with Snowflake/Superset this is very unhandy.  
   Furthermore, this apporach would not allow any `select *` statements here.
   
   I think it simply is a bug and it needs to be fixed so the way the column names are treated is always consistent.
   
   ### Environment
   
   Tested versions:
   
   * 1.0.1
   * 1.4 rc4
   
   ### Checklist
   
   Make sure to follow these steps before submitting your issue - thank you!
   
   - [x] I have checked the superset logs for python stacktraces and included it here as text if there are any.
   - [x] I have reproduced the issue with at least the latest released version of superset.
   - [x] I have checked the issue tracker for the same issue and I haven't found one similar.
   
   ### Additional context
   
   This SQLALchemy issue is closely related: https://github.com/snowflakedb/snowflake-sqlalchemy/issues/157#issuecomment-807922786
   
   However, my feeling is that we should probably try to stick with SQLALchemy's way of treating the column names as lowercase instead of uppercase. However, then we need to do so consistently and correct this for the virtual datasets.
   
   **tai** pointed me in Slack to this piece of code: https://github.com/snowflakedb/snowflake-sqlalchemy/blob/9118cf8f18a0039f9cb5d3892ff2b1e5c82a05e0/snowdialect.py#L217


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] nytai commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
nytai commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1016671588


   Reopening the issue for the sake of discussion. 
   
   I don't think there's any design decision here, the column names should be returned in the same case they're defined in the database. 
   
   Superset isn't doing any case transformation on the columns and it cannot safely assume that all columns names should be in upper or lower case, it's up to the db/driver to return the correct case for superset to use, imo. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rumbin commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
rumbin commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1016129036


   @nytai I do not agree that this is the same issue.
   In the upstream issue it is proposed to treat the columns as uppercase while the proposed solution here is lowercase.
   
   I think that there needs to be a design discussion in this issue before deciding how the column case is to be handled.
   This _may_ then lead to the conclusion that the upstream bug is sufficient.
   
   Would you please reopen the issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] nytai edited a comment on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
nytai edited a comment on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1016114073


   This issue is the same as the one you linked https://github.com/snowflakedb/snowflake-sqlalchemy/issues/157
   
   
   Superset is using that library to connect to snowflake so it needs to be fixed there, there’s nothing that can be done in superset to fix this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rumbin commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
rumbin commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017572238


   @villebro in my eyes, your elaboration on the case handling is correct (including your opinion on the poor default choice of UPPERCASE...).
   
   About the rest of your explanation you lost me somewhere, though.
   What is — in your eyes — the intended behavior of Superset regarding the _caseless_ columns. Should they become lowercase (like the physical datasets already are) or UPPERCASE (like the virtual datasets already are)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
villebro commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017882023


   After giving this some more thought, I agree that the best solution would probably be to store the object names in their native case (UPPERCASE), as that would always work with quotes. If we do that, then all physical table column definitions would change from lowercase to UPPERCASE (unless MixedCase). I'm not sure we can do a database migration to fix old datasets reliably, so to deal with backwards compatibility issues, we could potentially add a config flag to the dataset metadata, something like `legacy_mode`, which we'd set to `True` for existing Snowflake datasets, but that would not be set on new datasets. When set to `True`, column names would be set with the current normalization logic, but when disabled would return the UPPERCASE name. Then existing charts wouldn't break, and users would have the option of migrating those by hand if they so wish.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro edited a comment on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
villebro edited a comment on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017459122


   Ok, so this all stems from the design decision Oracle-like databases have made, which I generally call "lowercase means uppercase means caseless", which IMO is a terrible assumption (I _really_ wish Snowflake hadn't gone with this design). I may be off on some details, but basically it boils down to this: When you write
   
   ```sql
   select column
   from table
   ```
   
   and the column name is caseless (=stored in metadata as UPPERCASE), the resulting column name in the query result is `COLUMN`. You get the same result by writing (correct me if I misremember these)
   
   ```sql
   select Column
   from table
   ```
   
   or
   
   ```sql
   select COLUMN
   from table
   ```
   
   or even
   
   ```sql
   select "COLUMN"
   from table
   ```
   
   but an error when writing
   
   ```sql
   select "column"
   from table
   ```
   
   because the column name isn't defined case-sensitively as "column", but rather stored in the metadata as "COLUMN" (=caseless). However, when you check the column name using `inspector.get_columns`, the column shows up as `column`, i.e. lowercase, which kind of isn't really true using Oracle logic, but ends up working in Superset. The Snowflake dialect does these to/from conversions here: https://github.com/snowflakedb/snowflake-sqlalchemy/blob/9118cf8f18a0039f9cb5d3892ff2b1e5c82a05e0/snowdialect.py#L217-L234 , and these functions kind of make sense when you know the assumptions behind case handling, but is super confusing when you're trying to write code that handles the cases correctly.
   
   Now, we actually have a method in `BaseEngineSpec` called `get_column_spec` which takes the source of the column definition as one argument: https://github.com/apache/superset/blob/035638c95869d0d4b7c9d44f13e88e3535ecdf4c/superset/db_engine_specs/base.py#L1313-L1320
   What we could do here is add `column_name` as an argument and add a field `normalized_name` to the `ColumnSpec` type and return the normalized column name on affected dbs (Oracle, Snowflake et al) if the column metadata originates from a query (=virtual table), but leave it unchanged for physical tables. This would just need to be implemented in the SQLAlchemy model and shouldn't be lots of work, but it would be really important to do super thorough testing to make sure it handles all cases correctly (caseless, UPPERCASE, "lowercase" and "Mixed Case").
   
   This probably sounds really confusing, so maybe it makes sense to have a kickoff call to get this work started to make sure we fully understand the problem and have a clear proposal for solving this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] nytai closed issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
nytai closed issue #18085:
URL: https://github.com/apache/superset/issues/18085


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] agusfigueroa-htg commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
agusfigueroa-htg commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1018411477


   Hi all, 
   
   First of all thanks for the discussion. It's very enriching, and the proposed solution would reach case consistency and solve the issue as mentioned. 
   
   After sharing this with our team another option that popped up, **in case this only affects filters interoperability**,  is to add the option to make the filters case insensitive. 
   
   In that case, instead of looking for columns named the same (`dataset1.colname=dataset2.colname`), we can apply the filters "case insensitive" like (`lower(dataset1.colname)=lower(dataset2.colname)`), creating a flag for it in case users want to apply it or not.
   
   This would lay fully on the Superset side and would be transparent across different databases and what they decide to do in the future (return uppercase, lowercase, mix case or so). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rumbin commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
rumbin commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1016147866


   From the linked upstream issues I understand that it is unlikely that this porblem will be fixed upstream.
   
   I understood from these issues that the correct case can be extracted from the cursor, but it seems that we are not currently doing so in Superset.
   Furthermorte, I am not quite sure what the desired behavior in Superset would actually be.
   
   So, in my eyes there first needs to be a design decision.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] agusfigueroa-htg commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
agusfigueroa-htg commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1019274941


   Good to see that we are all in the same boat :) and the points presented by you two are pretty much valid - the filter one is more of a quick escape rather than a real fix. 
   
   @villebro if I got you correctly, it makes sense to work on how the object names are stored in Superset. I don't know where that is defined in the code (I am new to the repo!), but if you have some references at a hand I am happy to help here. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro edited a comment on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
villebro edited a comment on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017591624


   _In theory_, i think the optimal solution would be to have the column cases equal for both physical and virtual datasets. I _think_ this problem would go away if we called the normalization function on the column names when we fetch column names from the query metadata when we create a new virtual dataset. However, I believe this would break backward compatibility with previously created virtual tables, as refreshing the column metadata for pre-existing virtual tables would change the case from upper case to lower case, which would probably break all existing charts. We could, of course, do a database migration which would update all existing chart metadata, but that would be very tricky, as column names appear in lots of different controls (we also don't know what custom controls containing column names some orgs might have if they've created custom viz plugins).
   
   Therefore, the approach I'm proposing is to rather be aware of the dataset type, and then be able to work around this problem at query creation time, i.e. do the normalization there. As the dataset metadata would be unaffected (caseless virtual dataset column names would still be UPPER CASE), this change would be backwards compatible, and therefore wouldn't require any database migrations.
   
   However, if column cases are different for physical vs virtual tables, there is the risk that native filters will not work properly on datasets of different types. So this may require some further planning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rumbin commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
rumbin commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1019081142


   @agusfigueroa-htg having the column matching if dashboard filters case-insensitive is a quick win in my eyes.
   However, some edge cases remain:
   
   1. When filters are handled in Jinja, users will need to implement their own case-insensitivity.
   2. When users convert a physical to a virtual dataset, or vice versa, the column case will flip, causing possible bugs. Such a conversation may be needed, e.g., when the source model now includes soft deletes which then need to be filtered out, or, the other way round, when it is decided to convert a virtual dataset into a table or view.
   3. Users might be confused by the case inconsistency and they might not be aware of the case-insensitivity of the filters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rumbin commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
rumbin commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1016145399


   @nytai The linked upstream issue is referring to another issue that had been closed as the issue can be handled by the client's code: https://github.com/snowflakedb/snowflake-sqlalchemy/issues/157#issuecomment-913699495 
   
   @villebro was involved in closing the related upstream issue: https://github.com/snowflakedb/snowflake-sqlalchemy/issues/73
   
   So it seems to me as a bit of a ping pong to where this issue is to be handled.
   
   I'd be glad if Ville could comment on this. Maybe we could also have a Slack discussion of video call if needed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rumbin commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
rumbin commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017908535


   @villebro this is what I had in mind, too.
   The backward compatibility definitely needs some good care and I think that such a flag/switch would work nicely, although it introduces some ugly legacy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rumbin commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
rumbin commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1023562084


   Stumbled on a related, merged PR: #5827


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] nytai commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
nytai commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1016114073


   This issue is the same as the one you linked https://github.com/snowflakedb/snowflake-sqlalchemy/issues/157#issuecomment-807922786 
   
   
   Superset is using that library to connect to snowflake so it needs to be fixed there, there’s nothing that can be done in superset to fix this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] nytai commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
nytai commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1016677388


   Although using the cursor to get the column names is a likely fix I do feel strongly that the database driver should not be applying any case transformations on the payload returned from the db. 
   
   Since @villebro did propose that as a solution to the issue, I agree, it would be good to his thoughts on the matter 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] nytai commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
nytai commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017204425


   ok, revisiting this it seems unlikely that that snowflake-sqlalchemy will stop denormalizing the column names in the payload given how long it's been open and that the cursor description seems to be an acceptable workaround. 
   
   I now agree that this could be addressed in superset given the challenges it presents on dashboards with charts that are backed by virtual & physical datasets. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
villebro commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1019084851


   @agusfigueroa-htg that's a really good idea, and could be useful even for other databases. However, I do feel this is an issue we probably need to resolve, and the points raised by @rumbin are IMO all very valid. It's also good to keep in mind that this not only affects Snowflake, but even other Oracle-like dialects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] nytai commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
nytai commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017771314


   I was originally thinking that what would make the most sense is for superset to use the same case that it's defined in the db (ie, most likely UPPERCASE for snowflake, but not _always_), however since the sqlalchemy dialect is doing this "normalization" there really is no way to know for sure what the case is in the db. We could update the `get_column_spec` to instead use a describe table stmt (or whatever the dialect is using to get table names) and get the column names from the cursor (not sure if this would actually work). 
   
   However, this gets very complicated when we take into consideration all the existing datasets out there, as users who upgrade to a version with this fix would start seeing issues when they run the sync columns action and the case changes (which would break all their existing dashboard filters, etc). 
   
   I too really wish snowflake had not gone with this UPPERCASE by default design, and I especially wish that the sqlalchemy dialect hadn't gone with this "normalize" column names approach. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] agusfigueroa-htg commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
agusfigueroa-htg commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1039176940


   Hi @villebro ! I assume we want to solve this on the connector side, is that correct? If I get a rough idea of where to start looking at, I hope I can contribute on taking the first steps :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rumbin commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
rumbin commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1016111717


   Would it be feasible to have the case conversion configurable for each database connection?
   
   I am thinking about ways to fix this without breaking existing datasets/charts/dashboards that would suddenly have their colunm name case changed upon resync of the column names...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
villebro commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017459122


   Ok, so this all stems from the design decision Oracle-like databases have made, which I generally call "lowercase means uppercase means caseless", which IMO is a terrible assumption (I _really_ wish Snowflake hadn't gone with this design). I may be off on some details, but basically it boils down to this: When you write
   
   ```sql
   select column
   from table
   ```
   
   and the column name is caseless (=stored in metadata as UPPERCASE), the resulting column name in the query result is `COLUMN`. You get the same result by writing (correct me if I misremember these)
   
   ```sql
   select Column
   from table
   ```
   
   or
   
   ```sql
   select COLUMN
   from table
   ```
   
   or even
   
   ```sql
   select "COLUMN"
   from table
   ```
   
   but an error when writing
   
   ```sql
   select "column"
   from table
   ```
   
   because the column name isn't defined case-sensitively as "column", but rather stored in the metadata as "COLUMN" (=caseless). However, when you check the column name using `inspector.get_columns`, the column shows up as `column`, i.e. lowercase, which kind of isn't really true using Oracle logic, but ends up working in Superset. The Snowflake dialect does these to/from conversions here: https://github.com/snowflakedb/snowflake-sqlalchemy/blob/9118cf8f18a0039f9cb5d3892ff2b1e5c82a05e0/snowdialect.py#L217-L234 , and these functions kind of make sense when you know the assumptions behind case handling, but is super confusing when you're trying to write code that handles the cases correctly.
   
   Now, we actually have a method in `BaseEngineSpec` called `get_column_spec` which takes the source of the column definition as one argument: https://github.com/apache/superset/blob/035638c95869d0d4b7c9d44f13e88e3535ecdf4c/superset/db_engine_specs/base.py#L1313-L1320
   What we could do here is add a field `normalized_name` to the `ColumnSpec` type and return the normalized column name on affected dbs (Oracle, Snowflake et) if the column metadata originates from a query (=virtual table), but leave it unchanged for physical tables. This would just need to be implemented in the SQLAlchemy model and shouldn't be lots of work, but it would be really important to do super thorough testing to make sure it handles all cases correctly (caseless, UPPERCASE, "lowercase" and "Mixed Case").
   
   This probably sounds really confusing, so maybe it makes sense to have a kickoff call to get this work started to make sure we fully understand the problem and have a clear proposal for solving this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] villebro commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
villebro commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017591624


   _In theory_, i think the optimal solution would be to have the column cases equal for both physical and virtual datasets. I _think_ this problem would go away if we called the normalization function on the column names when we fetch column names from the query metadata when we create a new virtual dataset. However, I believe this would break backward compatibility with previously created virtual tables, as refreshing the column metadata for pre-existing virtual tables would change the case from upper case to lower case, which would probably break all existing charts. We could, of course, do a database migration which would update all existing chart metadata, but that would be very tricky, as column names appear in lots of different controls (we also don't know what custom controls containing column names some orgs might have if they've created custom viz plugins).
   
   Therefore, the approach I'm proposing is to rather be aware of the dataset type, and then be able to work around this problem at query creation time, i.e. do the normalization there. As the dataset metadata would be unaffected (caseless virtual dataset column names would still be UPPER CASE), this change would be backwards compatible, and therefore wouldn't require any database migrations.
   
   However, if column cases are different for physical vs virtual tables, there is the risk that native filters will work properly on datasets of different types. So this may require some further planning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] nytai commented on issue #18085: Snowflake: Inconsistent column name case

Posted by GitBox <gi...@apache.org>.
nytai commented on issue #18085:
URL: https://github.com/apache/superset/issues/18085#issuecomment-1017867810


   so this is the function in sqlalchemy-snowflake that returns the table names: https://github.com/snowflakedb/snowflake-sqlalchemy/blob/9118cf8f18a0039f9cb5d3892ff2b1e5c82a05e0/snowdialect.py#L418-L487 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org