You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by dongjoon-hyun <gi...@git.apache.org> on 2017/10/22 18:07:04 UTC

[GitHub] spark pull request #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql...

GitHub user dongjoon-hyun opened a pull request:

    https://github.com/apache/spark/pull/19552

    [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql.hive.caseSensitiveInferenceMode` by default

    ## What changes were proposed in this pull request?
    
    In Spark 2.2.0, `spark.sql.hive.caseSensitiveInferenceMode` has a critical issue by default. 
    
    - [SPARK-19611](https://issues.apache.org/jira/browse/SPARK-19611) uses `INFER_AND_SAVE` at 2.2.0 since Spark 2.1.0 breaks some Hive tables backed by case-sensitive data files.
    
      > This situation will occur for any Hive table that wasn't created by Spark or that was created prior to Spark 2.1.0. If a user attempts to run a query over such a table containing a case-sensitive field name in the query projection or in the query filter, the query will return 0 results in every case.
    
    - However, [SPARK-22306](https://issues.apache.org/jira/browse/SPARK-22306) reports this also corrupts Hive Metastore schema by removing bucketing information (BUCKETING_COLS, SORT_COLS) and changing owner. This is undesirable side-effects. Hive Metastore is a shared resource and Spark should not corrupt it by default. 
    
    - Since Spark 2.3.0 supports Bucketing, BUCKETING_COLS and SORT_COLS look okay at least. However, we need to figure out the issue of changing owners. Also, we cannot backport bucketing patch into `branch-2.2`. We need to verify this option with more tests before releasing 2.3.0.
    
    This PR proposes to recover that option back to `NEVER_INFO` like Spark 2.2.0 by default. Users can take a risk by enabling `INFER_AND_SAVE` by themselves.
    
    ## How was this patch tested?
    
    Pass the existing tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-22329

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19552.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19552
    
----
commit a256627dbc2772e69cd0f9f2aa43b384165e3657
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-10-22T17:59:15Z

    [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql.hive.caseSensitiveInferenceMode` by default

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19552#discussion_r146385329
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -388,7 +388,7 @@ object SQLConf {
         .stringConf
         .transform(_.toUpperCase(Locale.ROOT))
         .checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
    -    .createWithDefault(HiveCaseSensitiveInferenceMode.INFER_AND_SAVE.toString)
    +    .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
    --- End diff --
    
    We can improve the documentation instead of changing the default. 
    
    If my understanding is right, this occurs only when Spark SQL tries to read the table created by the other tables.  


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql.hive.c...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19552
  
    I close this since #19622 is merged.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql.hive.c...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19552
  
    Thank you for review, @gatorsmile and @budde .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql.hive.c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19552
  
    **[Test build #82961 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82961/testReport)** for PR 19552 at commit [`a256627`](https://github.com/apache/spark/commit/a256627dbc2772e69cd0f9f2aa43b384165e3657).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql.hive.c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19552
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82961/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql.hive.c...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19552
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19552: [SPARK-22329][SQL] Use NEVER_INFER for `spark.sql.hive.c...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19552
  
    **[Test build #82961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82961/testReport)** for PR 19552 at commit [`a256627`](https://github.com/apache/spark/commit/a256627dbc2772e69cd0f9f2aa43b384165e3657).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org