You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:02:17 UTC
[jira] [Updated] (SPARK-21158) SparkSQL function SparkSession.Catalog.ListTables() does not handle spark setting for case-sensitivity

     [ https://issues.apache.org/jira/browse/SPARK-21158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-21158:
---------------------------------
    Labels: bulk-closed easyfix features sparksql windows  (was: easyfix features sparksql windows)

> SparkSQL function SparkSession.Catalog.ListTables() does not handle spark setting for case-sensitivity
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-21158
>                 URL: https://issues.apache.org/jira/browse/SPARK-21158
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>         Environment: Windows 10
> IntelliJ 
> Scala
>            Reporter: Kathryn McClintic
>            Priority: Minor
>              Labels: bulk-closed, easyfix, features, sparksql, windows
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> When working with SQL table names in Spark SQL we have noticed some issues with case-sensitivity.
> If you set spark.sql.caseSensitive setting to be true, SparkSQL stores the table names in the way it was provided. This is correct.
> If you set  spark.sql.caseSensitive setting to be false, SparkSQL stores the table names in lower case.
> Then, we use the function sqlContext.tableNames() to get all the tables in our DB. We check if this list contains(<"string of table name">) to determine if we have already created a table. If case-sensitivity is turned off (false), this function should look if the table name is contained in the table list regardless of case.
> However, it tries to look for only ones that match the lower case version of the stored table. Therefore, if you pass in a camel or upper case table name, this function would return false when in fact the table does exist.
> The root cause of this issue is in the function SparkSession.Catalog.ListTables()
> For example:
> In your SQL context - you have  four tables and you have chosen to have spark.sql.case-Sensitive=false so it stores your tables in lowercase: 
> carnames
> carmodels
> carnamesandmodels
> users
> dealerlocations
> When running your pipeline, you want to see if you have already created the temp join table of 'carnamesandmodels'. However, you have stored it as a constant which reads: CarNamesAndModels for readability.
> So you can use the function
> sqlContext.tableNames().contains("CarNamesAndModels").
> This should return true - because we know its already created, but it will currently return false since CarNamesAndModels is not in lowercase.
> The responsibility to change the name passed into the .contains method to be lowercase should not be put on the spark user. This should be done by spark sql if case-sensitivity is turned to false.
> Proposed solutions:
> -	Setting case sensitive in the sql context should make the sql context be agnostic to case but not change the storage of the table
> - There should be a custom contains method for ListTables() which converts the tablename to be lowercase before checking
> - SparkSession.Catalog.ListTables() should return the list of tables in the input format instead of in all lowercase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org