You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2022/04/19 10:04:59 UTC

[GitHub] [superset] rumbin opened a new issue, #19773: [SIP-82] Case-insensitive handling of datasets' column names

rumbin opened a new issue, #19773:
URL: https://github.com/apache/superset/issues/19773

   
   !!! warning "This document is still WIP; review of @villebro, @agusfigueroa-htg is required."
   
   -------------
   
   
   ## [SIP-82] Case-insensitive handling of datasets' column names
   
   ### Motivation
   
   The default case (upper/lower) and case sensitivity of object names (schemas, tables, columns,...) is handled very differently in the various DSMSes that are supported by Superset.  
   E.g., Postgres interprets unquoted column names as lowercase while Oracle and Snowflake treat them as UPPERCASE.
   
   Superset is currently not consistently treating the case of column names. As a result, _virtual_ datasets of an UPPERCASE DB like Snowflake are represented in UPPERCASE, while _physical_ datasets of these DBs have lowercase column names in Superset.
   See #18085 for more details and discussion on how this ought to be fixed.
   
   The main issues, that arise from this inconsistency, are:
   
   1. Dashboard filters refer to case-sensitive representations of the columns. If a dashboard contains charts that are based on physical _and_ virtual datasets, the filters will only be applied to the  ones there the case of the column name matches.
   2. If a physical dataset is later on changed to become a virtual dataset (or vice versa), the case of the column names changes and existing charts and filters will be harmed. Such changes are pretty common, e.g., when a virtual dataset is promoted to become a view in the database or when an existing table needs some more logic applied (e.g. filtering of soft-deleted records).
   3. Migration of the data warehouse system — e.g. from Postgres to Snowflake, while reproducing the data marts — will cause the column names to potentially change in case, thus breaking existing charts.
   
   
   ### Proposed Change
   
   In order to find a database-agnostic solution which dows not require upstream changes on SQLAlchemy drivers, this issue my best be tackled by making Superset handle column names _case-insensitively_. I.e., all columns should _internally_ be treated in lowercase.
   
   There is a small risk of datasets having two columns that would translate to the same case-insensitive (lowercase) representation of the column name. However, @villebro [feels](https://github.com/apache/superset/issues/18085#issuecomment-1102213239) that only very few people would really have a need to distinguish columns based on their case.
   
   However, we need to ensure that e.g. CamelCase column names keep their human readability. Thus, I suggest to auto-fill the `label` of the dataset column (a.k.a. the `verbose_name`) with it's original, case-sensitive, name in cases where this field is not already filled (do not overwrite existing information).
   
   ### New or Changed Public Interfaces
   
   @villebro, @agusfigueroa-htg - I need your input regarding this and the following sections of this SIP...
   
   
   ### New dependencies
   
   
   ### Migration Plan and Compatibility
   
   
   ### Rejected Alternatives
   
   Consistency could be introduced on a per-DBMS basis, i.e. per SQLAlchemy driver, so all datasets in UPPERCASE DBMSes would be represented in UPPERCASE, regardless of whether they are physical or virtual datasets. Thsi would fix the aforementioned issues 1 and 2. However, the 3rd issue would not be covered. Furthermore, this issue may be more error-prone, when introducing support for more DBMSes or when upstream changes occur.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] agusfigueroa-htg commented on issue #19773: [SIP-82] Case-insensitive handling of datasets' column names

Posted by GitBox <gi...@apache.org>.
agusfigueroa-htg commented on issue #19773:
URL: https://github.com/apache/superset/issues/19773#issuecomment-1117529727

   @rumbin @villebro sorry, this took me longer than expected. First iteration can be found below :) 
   
   **New or Changed Public Interfaces**
   
   We will need:
   
   1.  to add the verbose_name as a label for the names of the columns - create the fields as metadata for the tables and the corresponding UI representation.
   2. For each column, add a parameter that "it's treated case-insensitive", set by default to true. This would then cover the cases where the to-lowercase translation is the same. Again, we need to add this as metadata for the tables and the corresponding table columns UI representation.
   
   **New dependencies**
   
   I don't see the need for any new packages.
   
   **Migration Plan and Compatibility**
   
   As this is metadata to be added at the column level, I don't see any URLs or references that may break. We will need though for visualizations to refer to "verbose_name", and for filters and calculations to the treated "column-name", by default lowercase and eventually modified by the check mark I stated in (2).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [I] [SIP-84] Case-insensitive handling of datasets' column names [superset]

Posted by "JohnDietrich-Pepper (via GitHub)" <gi...@apache.org>.
JohnDietrich-Pepper commented on issue #19773:
URL: https://github.com/apache/superset/issues/19773#issuecomment-1826413898

   This is a big problem for us.  We have some physical tables and some virtual tables that exist on the same dashboard connected to snowflake.  There appears to be no way to force the column names in the virtual tables to lowercase and as a result the filters down apply to charts on the dashboard referencing the same underlying data but through a different structure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] yousoph commented on issue #19773: [SIP-84] Case-insensitive handling of datasets' column names

Posted by GitBox <gi...@apache.org>.
yousoph commented on issue #19773:
URL: https://github.com/apache/superset/issues/19773#issuecomment-1144202651

   Noticed there were two SIP-82s, I've renumbered this one to SIP-84


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rumbin commented on issue #19773: [SIP-82] Case-insensitive handling of datasets' column names

Posted by GitBox <gi...@apache.org>.
rumbin commented on issue #19773:
URL: https://github.com/apache/superset/issues/19773#issuecomment-1102793632

   @agusfigueroa-htg same here ;-).
   
   https://github.com/apache/superset/issues/5602 might help, I guess.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rusackas closed issue #19773: [SIP-84] Case-insensitive handling of datasets' column names

Posted by GitBox <gi...@apache.org>.
rusackas closed issue #19773: [SIP-84] Case-insensitive handling of datasets' column names
URL: https://github.com/apache/superset/issues/19773


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] agusfigueroa-htg commented on issue #19773: [SIP-82] Case-insensitive handling of datasets' column names

Posted by GitBox <gi...@apache.org>.
agusfigueroa-htg commented on issue #19773:
URL: https://github.com/apache/superset/issues/19773#issuecomment-1102509586

   Hi @rumbin! Sorry, first time doing this. 
   
   Are there any established examples/guidelines that I can look at, so to work on the sections left of the SIP? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rusackas commented on issue #19773: [SIP-84] Case-insensitive handling of datasets' column names

Posted by GitBox <gi...@apache.org>.
rusackas commented on issue #19773:
URL: https://github.com/apache/superset/issues/19773#issuecomment-1153431470

   @rumbin thanks for this SIP. If it's ready for official discussion, in a lead-up to voting, please post it to the dev@superset.apache.org email list as a [DISCUSS] thread. If you need help navigating those waters, say the word and I'll be happy to help.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] rusackas commented on issue #19773: [SIP-84] Case-insensitive handling of datasets' column names

Posted by GitBox <gi...@apache.org>.
rusackas commented on issue #19773:
URL: https://github.com/apache/superset/issues/19773#issuecomment-1343168914

   Closing this due to apparent lack of interest/activity here or on the mailing list. We'll also consider the SIP/proposal process closed as well. If you want to rekindle this proposal, please re-open this Issue, and send a new [DISCUSS] thread to the dev@ mailing list. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


Re: [I] [SIP-84] Case-insensitive handling of datasets' column names [superset]

Posted by "villebro (via GitHub)" <gi...@apache.org>.
villebro commented on issue #19773:
URL: https://github.com/apache/superset/issues/19773#issuecomment-1826421605

   @JohnDietrich-Pepper this issue should be fixed as of #24471 and especially #24982 (this makes the change backwards compatible). Please let me know if you need help implementing the changes in those PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org