You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by budde <gi...@git.apache.org> on 2017/03/10 21:12:34 UTC

[GitHub] spark pull request #17249: [SPARK-19611][SQL] Preserve metastore field order...

GitHub user budde opened a pull request:

    https://github.com/apache/spark/pull/17249

    [SPARK-19611][SQL] Preserve metastore field order when merging inferred schema

    ## What changes were proposed in this pull request?
    
    The ```HiveMetastoreCatalog.mergeWithMetastoreSchema()``` method added in #16944 may
    not preserve the same field order as the metastore schema in some cases, which can cause
    queries to fail. This change ensures that the metastore field order is preserved.
    
    ## How was this patch tested?
    
    A test for ensuring that metastore order is preserved was added to ```HiveSchemaInferenceSuite.```
    The particular failure usecase from #16944 was tested manually as well.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/budde/spark PreserveMetastoreFieldOrder

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17249.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17249
    
----
commit 99eb2dc9b07e953249b4ece6ea3a44ecebc9287b
Author: Budde <bu...@amazon.com>
Date:   2017-03-10T21:04:22Z

    [SPARK-19611][SQL] Preserve metastore field order when merging inferred schema
    
    The ```HiveMetastoreCatalog.mergeWithMetastoreSchema()``` method added in #16944 may
    not preserve the same field order as the metastore schema in some cases, which can cause
    queries to fail. This change ensures that the metastore field order is preserved.
    
    A test for ensuring that metastore order is preserved was added to ```HiveSchemaInferenceSuite.```
    The particular failure usecase from #16944 was tested manually as well.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17249: [SPARK-19611][SQL] Preserve metastore field order when m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17249
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74337/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17249: [SPARK-19611][SQL] Preserve metastore field order when m...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17249
  
    thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17249: [SPARK-19611][SQL] Preserve metastore field order when m...

Posted by budde <gi...@git.apache.org>.
Github user budde commented on the issue:

    https://github.com/apache/spark/pull/17249
  
    I've verified that it does using the same procedure but I'll let @dongjoon-hyun confirm as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17249: [SPARK-19611][SQL] Preserve metastore field order when m...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17249
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17249: [SPARK-19611][SQL] Preserve metastore field order when m...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17249
  
    **[Test build #74337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74337/testReport)** for PR 17249 at commit [`99eb2dc`](https://github.com/apache/spark/commit/99eb2dc9b07e953249b4ece6ea3a44ecebc9287b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17249: [SPARK-19611][SQL] Preserve metastore field order...

Posted by budde <gi...@git.apache.org>.
Github user budde commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17249#discussion_r105499679
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -356,13 +356,10 @@ private[hive] object HiveMetastoreCatalog {
           .filterKeys(!inferredSchema.map(_.name.toLowerCase).contains(_))
           .values
           .filter(_.nullable)
    -
         // Merge missing nullable fields to inferred schema and build a case-insensitive field map.
         val inferredFields = StructType(inferredSchema ++ missingNullables)
           .map(f => f.name.toLowerCase -> f).toMap
    -    StructType(metastoreFields.map { case(name, field) =>
    -      field.copy(name = inferredFields(name).name)
    -    }.toSeq)
    +    StructType(metastoreSchema.map(f => f.copy(name = inferredFields(f.name).name)))
    --- End diff --
    
    This should ensure the proper ordering is used. Iterating over the ```metastoreFields``` map isn't guaranteed to maintain the original ordering.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17249: [SPARK-19611][SQL] Preserve metastore field order...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17249#discussion_r105499863
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -356,13 +356,10 @@ private[hive] object HiveMetastoreCatalog {
           .filterKeys(!inferredSchema.map(_.name.toLowerCase).contains(_))
           .values
           .filter(_.nullable)
    -
         // Merge missing nullable fields to inferred schema and build a case-insensitive field map.
         val inferredFields = StructType(inferredSchema ++ missingNullables)
           .map(f => f.name.toLowerCase -> f).toMap
    -    StructType(metastoreFields.map { case(name, field) =>
    -      field.copy(name = inferredFields(name).name)
    -    }.toSeq)
    +    StructType(metastoreSchema.map(f => f.copy(name = inferredFields(f.name).name)))
    --- End diff --
    
    Yep. I'm still building this PR locally, but it looks good logically. :) I'll post my result soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17249: [SPARK-19611][SQL] Preserve metastore field order...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17249#discussion_r105499333
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -356,13 +356,10 @@ private[hive] object HiveMetastoreCatalog {
           .filterKeys(!inferredSchema.map(_.name.toLowerCase).contains(_))
           .values
           .filter(_.nullable)
    -
         // Merge missing nullable fields to inferred schema and build a case-insensitive field map.
         val inferredFields = StructType(inferredSchema ++ missingNullables)
           .map(f => f.name.toLowerCase -> f).toMap
    -    StructType(metastoreFields.map { case(name, field) =>
    -      field.copy(name = inferredFields(name).name)
    -    }.toSeq)
    +    StructType(metastoreSchema.map(f => f.copy(name = inferredFields(f.name).name)))
    --- End diff --
    
    I see. Now `metastoreSchema` is used instead of `metastoreFields`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17249: [SPARK-19611][SQL] Preserve metastore field order...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17249


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17249: [SPARK-19611][SQL] Preserve metastore field order when m...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/17249
  
    Sure, I'll test locally, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17249: [SPARK-19611][SQL] Preserve metastore field order when m...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17249
  
    **[Test build #74337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74337/testReport)** for PR 17249 at commit [`99eb2dc`](https://github.com/apache/spark/commit/99eb2dc9b07e953249b4ece6ea3a44ecebc9287b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17249: [SPARK-19611][SQL] Preserve metastore field order when m...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17249
  
    @dongjoon-hyun can you verify if it fixes your problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17249: [SPARK-19611][SQL] Preserve metastore field order when m...

Posted by budde <gi...@git.apache.org>.
Github user budde commented on the issue:

    https://github.com/apache/spark/pull/17249
  
    Pinging @cloud-fan and @dongjoon-hyun


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17249: [SPARK-19611][SQL] Preserve metastore field order when m...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/17249
  
    Great. I also verified the patch.
    ```scala
    scala> sql("SELECT a, b FROM t1").show
    +---+---+
    |  a|  b|
    +---+---+
    |100|200|
    +---+---+
    
    
    scala> sql("SELECT * FROM t1").show
    +---+---+-----+---+----+
    |  a|  b|dummy|day|hour|
    +---+---+-----+---+----+
    |100|200| null|  1|  01|
    +---+---+-----+---+----+
    ```
    
    Thank you so much, @budde and @cloud-fan .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org