You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2016/05/09 01:02:10 UTC

[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/12993

    [SPARK-15217] [SQL] Always Case Insensitive in HiveSessionState

    #### What changes were proposed in this pull request?
    In a `HiveSessionState`, which is a given `SparkSession` backed by Hive, the analysis should not be case sensitive because the underlying Hive Metastore is case insensitive. 
    
    For example, 
    ```SQL
    CREATE TABLE tab1 (C1 int);
    SELECT C1 FROM tab1
    ```
    In the current implementation, we will get the following error because the column name is always stored in lower case. 
    ```
    cannot resolve '`C1`' given input columns: [c1]; line 1 pos 7
    org.apache.spark.sql.AnalysisException: cannot resolve '`C1`' given input columns: [c1]; line 1 pos 7
    ```
    
    This PR is to always use case insensitive analysis in `HiveSessionState`, no matter whether users set `spark.sql.caseSensitive` to true or false. 
    
    #### How was this patch tested?
    Added the related  test cases. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark caseSensitive

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12993.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12993
    
----
commit d7d96c34fde79d7078b27733f553deda6bb39fd4
Author: gatorsmile <ga...@gmail.com>
Date:   2016-05-09T00:35:43Z

    case insensitive in Hive

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12993#issuecomment-217762595
  
    **[Test build #58114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58114/consoleFull)** for PR 12993 at commit [`d7d96c3`](https://github.com/apache/spark/commit/d7d96c34fde79d7078b27733f553deda6bb39fd4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/12993#issuecomment-217757584
  
    **[Test build #58114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58114/consoleFull)** for PR 12993 at commit [`d7d96c3`](https://github.com/apache/spark/commit/d7d96c34fde79d7078b27733f553deda6bb39fd4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12993#issuecomment-217762675
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58114/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/12993#issuecomment-217770263
  
    cc @cloud-fan  @rxin @yhuai @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile closed the pull request at:

    https://github.com/apache/spark/pull/12993


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/12993#issuecomment-217955839
  
    We want to eliminate HiveSessionState, so this is going a step back, and this is taking another step back in diverging the behavior of the Hive one and non-Hive one.
    
    I don't think we should support this, and for now just make case sensitivity an internal config and not exposed to user. Our case sensitivity support is somewhat broken and does not follow sql standard (e.g. in postgres quoting something makes them case sensitive), so the simplest solution is to not support it for now and 
    
    See https://issues.apache.org/jira/browse/SPARK-15229


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/12993#issuecomment-217972795
  
    Agree. Let me close this now. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12993#issuecomment-217762674
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/12993#issuecomment-217777713
  
    Agree. We need to be careful for deciding the design. This PR is just to recover our previous behavior in `HiveContext`. 
    
    Regarding case sensitivity, it is complicated and platform/vender-specific. Below is based on my search. It might not be 100% correct.
    
    - For the un-quoted identifiers, the SQL2003 compliance and DB2 is No. Oracle and SQL Server are configurable, but the default is No. 
    - For the quoted/delimited identifiers, most traditional RDBMS are case sensitive. Hive is special. Starting from Hive 1.3, Hive supports quoted identifiers in Column names. https://issues.apache.org/jira/browse/HIVE-6013 However, this is not applicable to the Table/Database names in Hive. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-15217] [SQL] Always Case Insensitive in...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/12993#issuecomment-217773874
  
    I think we need to discuss it more:
    
    1. should we allow the case sensitivity to be configurable? It's sometimes out of our control like hive catalog, which is always case insensitive
    2. except case sensitivity, should we also include the concept of case-preserving for external catalog?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org