You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by seancxmao <gi...@git.apache.org> on 2018/10/28 06:57:43 UTC

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

GitHub user seancxmao opened a pull request:

    https://github.com/apache/spark/pull/22868

    [SPARK-25833][SQL][DOCS] Update migration guide for Hive view compatibility

    ## What changes were proposed in this pull request?
    Both Spark and Hive support views. However in some cases views created by Hive are not readable by Spark. For example, if column aliases are not specified in view definition queries, both Spark and Hive will generate alias names, but in different ways. In order for Spark to be able to read views created by Hive, users should explicitly specify column aliases in view definition queries.
    
    Given that it's not uncommon that Hive and Spark are used together in enterprise data warehouse, this PR aims to explicitly describe this compatibility issue to help users troubleshoot this issue easily.
    
    ## How was this patch tested?
    Docs are manually generated and checked locally.
    
    ```
    SKIP_API=1 jekyll serve
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/seancxmao/spark SPARK-25833

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22868.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22868
    
----
commit e5b3a11c2cedcbbe528cc72d465ab6e27f5215e3
Author: seancxmao <se...@...>
Date:   2018-10-28T06:46:10Z

    update migration guide for hive view compatibility

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r229059209
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -51,6 +51,22 @@ Spark SQL supports the vast majority of Hive features, such as:
     * Explain
     * Partitioned tables including dynamic partition insertion
     * View
    +  * If column aliases are not specified in view definition queries, both Spark and Hive will
    +    generate alias names, but in different ways. In order for Spark to be able to read views created
    +    by Hive, users should explicitly specify column aliases in view definition queries. As an
    +    example, Spark cannot read `v1` created as below by Hive.
    +
    +    ```
    +    CREATE TABLE t1 (c1 INT, c2 STRING);
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1, upper(c2) FROM t1) t2;
    +    ```
    +
    +    Instead, you should create `v1` as below with column aliases explicitly specified.
    +
    +    ```
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1 AS inc_c1, upper(c2) AS upper_c2 FROM t1) t2;
    --- End diff --
    
    Also, let's update this one together.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98156/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r229059043
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -51,6 +51,22 @@ Spark SQL supports the vast majority of Hive features, such as:
     * Explain
     * Partitioned tables including dynamic partition insertion
     * View
    +  * If column aliases are not specified in view definition queries, both Spark and Hive will
    +    generate alias names, but in different ways. In order for Spark to be able to read views created
    +    by Hive, users should explicitly specify column aliases in view definition queries. As an
    +    example, Spark cannot read `v1` created as below by Hive.
    +
    +    ```
    +    CREATE TABLE t1 (c1 INT, c2 STRING);
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1, upper(c2) FROM t1) t2;
    --- End diff --
    
    Could you simplify more by removing `CREATE TABLE` and using the following view creation?
    ```sql
    CREATE VIEW v1 AS SELECT * FROM (SELECT c + 1, upper(c) FROM (SELECT 1 c) t1) t2;
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    **[Test build #98156 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98156/testReport)** for PR 22868 at commit [`e5b3a11`](https://github.com/apache/spark/commit/e5b3a11c2cedcbbe528cc72d465ab6e27f5215e3).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    **[Test build #98197 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98197/testReport)** for PR 22868 at commit [`84d9e8c`](https://github.com/apache/spark/commit/84d9e8c20c12e5e3e3150de9a97f581858e3c9bd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r229064730
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -51,6 +51,22 @@ Spark SQL supports the vast majority of Hive features, such as:
     * Explain
     * Partitioned tables including dynamic partition insertion
     * View
    +  * If column aliases are not specified in view definition queries, both Spark and Hive will
    +    generate alias names, but in different ways. In order for Spark to be able to read views created
    +    by Hive, users should explicitly specify column aliases in view definition queries. As an
    +    example, Spark cannot read `v1` created as below by Hive.
    +
    +    ```
    +    CREATE TABLE t1 (c1 INT, c2 STRING);
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1, upper(c2) FROM t1) t2;
    --- End diff --
    
    Thanks for the finding. I'd like to remove `upper(c)` like the following.
    ```sql
    CREATE VIEW v1 AS SELECT * FROM (SELECT c + 1 FROM (SELECT 1 c) t1) t2;
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by seancxmao <gi...@git.apache.org>.

Github user seancxmao commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    **[Test build #98243 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98243/testReport)** for PR 22868 at commit [`26b4dee`](https://github.com/apache/spark/commit/26b4dee5cd8e4e5fe7e1a34543bad17e05b9b783).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Merged to `master` since we are voting now. We can have this later to `branch-2.4/branch-2.3`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r229051205
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -53,7 +53,20 @@ Spark SQL supports the vast majority of Hive features, such as:
     * View
       * If column aliases are not specified in view definition queries, both Spark and Hive will
         generate alias names, but in different ways. In order for Spark to be able to read views created
    -    by Hive, users should explicitly specify column aliases in view definition queries.
    +    by Hive, users should explicitly specify column aliases in view definition queries. As an
    +    example, Spark cannot read `v1` created as below by Hive.
    +
    +    ```
    +    CREATE TABLE t1 (c1 INT, c2 STRING);
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1, upper(c2) FROM t1) t2;
    --- End diff --
    
    nit : We could perhaps simplify the query to : 
    ```
    CREATE VIEW v1 AS (SELECT c1 + 1, upper(c2) FROM t1);
    ```
    what do you think ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by seancxmao <gi...@git.apache.org>.

Github user seancxmao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r229150147
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -51,6 +51,22 @@ Spark SQL supports the vast majority of Hive features, such as:
     * Explain
     * Partitioned tables including dynamic partition insertion
     * View
    +  * If column aliases are not specified in view definition queries, both Spark and Hive will
    +    generate alias names, but in different ways. In order for Spark to be able to read views created
    +    by Hive, users should explicitly specify column aliases in view definition queries. As an
    +    example, Spark cannot read `v1` created as below by Hive.
    +
    +    ```
    +    CREATE TABLE t1 (c1 INT, c2 STRING);
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1, upper(c2) FROM t1) t2;
    --- End diff --
    
    Good ideas. I have simplified the example. and tested the example above using Hive 2.3.3 and Spark 2.3.1.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Thanks. @seancxmao .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r229063489
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -53,7 +53,20 @@ Spark SQL supports the vast majority of Hive features, such as:
     * View
       * If column aliases are not specified in view definition queries, both Spark and Hive will
         generate alias names, but in different ways. In order for Spark to be able to read views created
    -    by Hive, users should explicitly specify column aliases in view definition queries.
    +    by Hive, users should explicitly specify column aliases in view definition queries. As an
    +    example, Spark cannot read `v1` created as below by Hive.
    +
    +    ```
    +    CREATE TABLE t1 (c1 INT, c2 STRING);
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1, upper(c2) FROM t1) t2;
    --- End diff --
    
    @dongjoon-hyun oh.. thanks .. because it requires an explicit correlation ? Sorry, don't have  1.2.2 env to try out ..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22868


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    **[Test build #98163 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98163/testReport)** for PR 22868 at commit [`e5b3a11`](https://github.com/apache/spark/commit/e5b3a11c2cedcbbe528cc72d465ab6e27f5215e3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by seancxmao <gi...@git.apache.org>.

Github user seancxmao commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    @dongjoon-hyun Do you mean SPARK-25833? Since SPARK-24864 is resolved as Won't Fix, I updated type, priority and title of SPARK-25833.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r228776349
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -51,6 +51,9 @@ Spark SQL supports the vast majority of Hive features, such as:
     * Explain
     * Partitioned tables including dynamic partition insertion
     * View
    +  * If column aliases are not specified in view definition queries, both Spark and Hive will
    +    generate alias names, but in different ways. In order for Spark to be able to read views created
    +    by Hive, users should explicitly specify column aliases in view definition queries.
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r228766091
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -51,6 +51,9 @@ Spark SQL supports the vast majority of Hive features, such as:
     * Explain
     * Partitioned tables including dynamic partition insertion
     * View
    +  * If column aliases are not specified in view definition queries, both Spark and Hive will
    +    generate alias names, but in different ways. In order for Spark to be able to read views created
    +    by Hive, users should explicitly specify column aliases in view definition queries.
    --- End diff --
    
    @seancxmao Thanks for adding the doc. Can a small example here help illustrate this better ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98243/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by seancxmao <gi...@git.apache.org>.

Github user seancxmao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r229156006
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -51,6 +51,22 @@ Spark SQL supports the vast majority of Hive features, such as:
     * Explain
     * Partitioned tables including dynamic partition insertion
     * View
    +  * If column aliases are not specified in view definition queries, both Spark and Hive will
    +    generate alias names, but in different ways. In order for Spark to be able to read views created
    +    by Hive, users should explicitly specify column aliases in view definition queries. As an
    +    example, Spark cannot read `v1` created as below by Hive.
    +
    +    ```
    +    CREATE TABLE t1 (c1 INT, c2 STRING);
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1, upper(c2) FROM t1) t2;
    +    ```
    +
    +    Instead, you should create `v1` as below with column aliases explicitly specified.
    +
    +    ```
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1 AS inc_c1, upper(c2) AS upper_c2 FROM t1) t2;
    --- End diff --
    
    Sure, updated as well.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    **[Test build #98197 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98197/testReport)** for PR 22868 at commit [`84d9e8c`](https://github.com/apache/spark/commit/84d9e8c20c12e5e3e3150de9a97f581858e3c9bd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Thank you all!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98197/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by seancxmao <gi...@git.apache.org>.

Github user seancxmao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r229155030
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -53,7 +53,20 @@ Spark SQL supports the vast majority of Hive features, such as:
     * View
       * If column aliases are not specified in view definition queries, both Spark and Hive will
         generate alias names, but in different ways. In order for Spark to be able to read views created
    -    by Hive, users should explicitly specify column aliases in view definition queries.
    +    by Hive, users should explicitly specify column aliases in view definition queries. As an
    +    example, Spark cannot read `v1` created as below by Hive.
    +
    +    ```
    +    CREATE TABLE t1 (c1 INT, c2 STRING);
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1, upper(c2) FROM t1) t2;
    --- End diff --
    
    It seems Hive 1.x does not allow `(` following `CREATE VIEW ... AS`, while Hive 2.x just works well. The following works on Hive 1.2.1, 1.2.2 and 2.3.3.
    
    ```
    CREATE VIEW v1 AS SELECT c1 + 1, upper(c2) FROM t1;
    ```
    
    Another finding is that the above view is readable by Spark though view column names are weird (`_c0`, `_c1`). Because Spark will add a `Project` between `View` and view definition query if their output attributes do not match. 
    
    ```
    spark-sql> explain extended v1;
    ...
    == Analyzed Logical Plan ==
    _c0: int, _c1: string
    Project [_c0#44, _c1#45]
    +- SubqueryAlias v1
       +- View (`default`.`v1`, [_c0#44,_c1#45])
          +- Project [cast((c1 + 1)#48 as int) AS _c0#44, cast(upper(c2)#49 as string) AS _c1#45] // this is added by AliasViewChild rule
             +- Project [(c1#46 + 1) AS (c1 + 1)#48, upper(c2#47) AS upper(c2)#49]
                +- SubqueryAlias t1
                   +- HiveTableRelation `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [c1#46, c2#47]
    ...
    ```
    
    But, if column aliases in subqueries of the view definition query are missing, Spark will not be able to read the view.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    **[Test build #98163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98163/testReport)** for PR 22868 at commit [`e5b3a11`](https://github.com/apache/spark/commit/e5b3a11c2cedcbbe528cc72d465ab6e27f5215e3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r229060706
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -53,7 +53,20 @@ Spark SQL supports the vast majority of Hive features, such as:
     * View
       * If column aliases are not specified in view definition queries, both Spark and Hive will
         generate alias names, but in different ways. In order for Spark to be able to read views created
    -    by Hive, users should explicitly specify column aliases in view definition queries.
    +    by Hive, users should explicitly specify column aliases in view definition queries. As an
    +    example, Spark cannot read `v1` created as below by Hive.
    +
    +    ```
    +    CREATE TABLE t1 (c1 INT, c2 STRING);
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1, upper(c2) FROM t1) t2;
    --- End diff --
    
    BTW, @dilipbiswal . The above query `CREATE VIEW v1 AS (SELECT c1 + 1, upper(c2) FROM t1);` seems to fail at Hive 1.2.2.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by seancxmao <gi...@git.apache.org>.

Github user seancxmao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r228845332
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -51,6 +51,9 @@ Spark SQL supports the vast majority of Hive features, such as:
     * Explain
     * Partitioned tables including dynamic partition insertion
     * View
    +  * If column aliases are not specified in view definition queries, both Spark and Hive will
    +    generate alias names, but in different ways. In order for Spark to be able to read views created
    +    by Hive, users should explicitly specify column aliases in view definition queries.
    --- End diff --
    
    Good idea. I have added an example.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #22868: [SPARK-25833][SQL][DOCS] Update migration guide f...

Posted by dilipbiswal <gi...@git.apache.org>.

Github user dilipbiswal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22868#discussion_r229062732
  
    --- Diff: docs/sql-migration-guide-hive-compatibility.md ---
    @@ -51,6 +51,22 @@ Spark SQL supports the vast majority of Hive features, such as:
     * Explain
     * Partitioned tables including dynamic partition insertion
     * View
    +  * If column aliases are not specified in view definition queries, both Spark and Hive will
    +    generate alias names, but in different ways. In order for Spark to be able to read views created
    +    by Hive, users should explicitly specify column aliases in view definition queries. As an
    +    example, Spark cannot read `v1` created as below by Hive.
    +
    +    ```
    +    CREATE TABLE t1 (c1 INT, c2 STRING);
    +    CREATE VIEW v1 AS SELECT * FROM (SELECT c1 + 1, upper(c2) FROM t1) t2;
    --- End diff --
    
    @dongjoon-hyun i was thinking, calling upper on a int column is probably not very intuitive :-)
    What do you think about adding a string literal in the projection ?
    
    ```
    SELECT c + 1, upper(d) FROM select 1 c, 'test' as d 
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98163/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    **[Test build #98156 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98156/testReport)** for PR 22868 at commit [`e5b3a11`](https://github.com/apache/spark/commit/e5b3a11c2cedcbbe528cc72d465ab6e27f5215e3).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #22868: [SPARK-25833][SQL][DOCS] Update migration guide for Hive...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22868
  
    **[Test build #98243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98243/testReport)** for PR 22868 at commit [`26b4dee`](https://github.com/apache/spark/commit/26b4dee5cd8e4e5fe7e1a34543bad17e05b9b783).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org