You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Kathryn McClintic (JIRA)" <ji...@apache.org> on 2017/06/21 00:03:00 UTC

[jira] [Created] (SPARK-21158) SparkSQL function SparkSession.Catalog.ListTables() does not handle spark setting for case-sensitivity

Kathryn McClintic created SPARK-21158:
-----------------------------------------

             Summary: SparkSQL function SparkSession.Catalog.ListTables() does not handle spark setting for case-sensitivity
                 Key: SPARK-21158
                 URL: https://issues.apache.org/jira/browse/SPARK-21158
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.0.0
         Environment: Windows 10
IntelliJ 
Scala
            Reporter: Kathryn McClintic
            Priority: Minor


When working with SQL table names in Spark SQL we have noticed some issues with case-sensitivity.

If you set spark.sql.caseSensitive setting to be true, SparkSQL stores the table names in the way it was provided. This is correct.

If you set  spark.sql.caseSensitive setting to be false, SparkSQL stores the table names in lower case.

Then, we use the function sqlContext.tableNames() to get all the tables in our DB. We check if this list contains(<"string of table name">) to determine if we have already created a table. If case-sensitivity is turned off (false), this function should look if the table name is contained in the table list regardless of case.

However, it tries to look for only ones that match the lower case version of the stored table. Therefore, if you pass in a camel or upper case table name, this function would return false when in fact the table does exist.

The root cause of this issue is in the function SparkSession.Catalog.ListTables()

For example:
In your SQL context - you have  four tables and you have chosen to have spark.sql.case-Sensitive=false so it stores your tables in lowercase: 
carnames
carmodels
carnamesandmodels
users
dealerlocations

When running your pipeline, you want to see if you have already created the temp join table of 'carnamesandmodels'. However, you have stored it as a constant which reads: CarNamesAndModels for readability.

So you can use the function
sqlContext.tableNames().contains("CarNamesAndModels").
This should return true - because we know its already created, but it will currently return false since CarNamesAndModels is not in lowercase.

Proposed solutions:
-	Setting case sensitive in the sql context should make the sql context be agnostic to case but not change the storage of the table
- There should be a custom contains method for ListTables() which converts the tablename to be lowercase before checking
- SparkSession.Catalog.ListTables() should return the list of tables in the input format instead of in all lowercase.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org