You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by vanzin <gi...@git.apache.org> on 2017/08/02 22:41:31 UTC

[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

GitHub user vanzin opened a pull request:

    https://github.com/apache/spark/pull/18824

    [SPARK-21617][SQL] Store correct metadata in Hive for altered DS table.

    This change fixes two issues:
    - when loading table metadata from Hive, restore the "provider" field of
      CatalogTable so DS tables can be identified.
    - when altering a DS table in the Hive metastore, make sure to not alter
      the table's schema, since the DS table's schema is stored as a table
      property in those cases.
    
    Also added a new unit test for this issue which fails without this change.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/vanzin/spark SPARK-21617

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18824.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18824
    
----
commit aae3abd673adc7ff939d842e49d566fa722403a3
Author: Marcelo Vanzin <va...@cloudera.com>
Date:   2017-08-02T21:47:34Z

    [SPARK-21617][SQL] Store correct metadata in Hive for altered DS table.
    
    This change fixes two issues:
    - when loading table metadata from Hive, restore the "provider" field of
      CatalogTable so DS tables can be identified.
    - when altering a DS table in the Hive metastore, make sure to not alter
      the table's schema, since the DS table's schema is stored as a table
      property in those cases.
    
    Also added a new unit test for this issue which fails without this change.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    **[Test build #80182 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80182/testReport)** for PR 18824 at commit [`aae3abd`](https://github.com/apache/spark/commit/aae3abd673adc7ff939d842e49d566fa722403a3).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18824#discussion_r131015612
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -616,15 +616,24 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         // Add table metadata such as table schema, partition columns, etc. to table properties.
         val updatedTable = withNewSchema.copy(
           properties = withNewSchema.properties ++ tableMetaToTableProps(withNewSchema))
    +
    +    // If it's a data source table, make sure the original schema is left unchanged; the
    +    // actual schema is recorded as a table property.
    +    val tableToStore = if (DDLUtils.isDatasourceTable(updatedTable)) {
    +      updatedTable.copy(schema = rawTable.schema)
    +    } else {
    +      updatedTable
    +    }
    +
         try {
    -      client.alterTable(updatedTable)
    +      client.alterTable(tableToStore)
         } catch {
           case NonFatal(e) =>
             val warningMessage =
               s"Could not alter schema of table  ${rawTable.identifier.quotedString} in a Hive " +
                 "compatible way. Updating Hive metastore in Spark SQL specific format."
             logWarning(warningMessage, e)
    -        client.alterTable(updatedTable.copy(schema = updatedTable.partitionSchema))
    +        client.alterTable(updatedTable.copy(schema = tableToStore.partitionSchema))
    --- End diff --
    
    This is the exception handling code I mentioned in the bug report which seems very suspicious. I had half a desire to just remove it, but maybe someone can explain to me why this code makes sense.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18824#discussion_r131210013
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -616,15 +616,24 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         // Add table metadata such as table schema, partition columns, etc. to table properties.
         val updatedTable = withNewSchema.copy(
           properties = withNewSchema.properties ++ tableMetaToTableProps(withNewSchema))
    +
    +    // If it's a data source table, make sure the original schema is left unchanged; the
    +    // actual schema is recorded as a table property.
    +    val tableToStore = if (DDLUtils.isDatasourceTable(updatedTable)) {
    +      updatedTable.copy(schema = rawTable.schema)
    --- End diff --
    
    Hmm, I see that this will break DS tables created with `newHiveCompatibleMetastoreTable` instead of `newSparkSQLSpecificMetastoreTable`.
    
    For the former, the only thing I can see that could be used to identify the case is the presence of serde properties in the table metadata. That could replace the `DDLUtils.isDatasourceTable(updatedTable)` check to see whether the schema needs to be updated.
    
    For the latter case, I see that `newSparkSQLSpecificMetastoreTable` stores the partition schema as the table's schema (which sort of explains the weird exception handling I saw). So this code is only correct if the partition schema cannot change. Where is the partition schema for a DS table defined? Is that under control of the user (or the data source implementation)? Because if it can change you can run into pretty much the same issue.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    **[Test build #80217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80217/testReport)** for PR 18824 at commit [`cc7cd95`](https://github.com/apache/spark/commit/cc7cd955e262e4b45730a6a909129b0c059b15ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80217/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18824#discussion_r131056558
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -616,15 +616,24 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         // Add table metadata such as table schema, partition columns, etc. to table properties.
         val updatedTable = withNewSchema.copy(
           properties = withNewSchema.properties ++ tableMetaToTableProps(withNewSchema))
    +
    +    // If it's a data source table, make sure the original schema is left unchanged; the
    +    // actual schema is recorded as a table property.
    +    val tableToStore = if (DDLUtils.isDatasourceTable(updatedTable)) {
    +      updatedTable.copy(schema = rawTable.schema)
    --- End diff --
    
    We do support `ALTER TABLE ADD COLUMN`, which relies on `alterTableSchema `. The data source tables can be read by Hive if possible.  Thus, I think we should not set the schema unchanged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18824#discussion_r131059637
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -413,7 +414,10 @@ private[hive] class HiveClientImpl(
             unsupportedFeatures += "partitioned view"
           }
     
    -      val properties = Option(h.getParameters).map(_.asScala.toMap).orNull
    +      val properties = Option(h.getParameters).map(_.asScala.toMap).getOrElse(Map())
    +
    +      val provider = properties.get(HiveExternalCatalog.DATASOURCE_PROVIDER)
    +        .orElse(Some(DDLUtils.HIVE_PROVIDER))
    --- End diff --
    
    Oh. Nvm. Looks like we access the key `DATASOURCE_PROVIDER` in `table.properties` for that purpose. This should be safe. Anyway, actually we will set `provider` for `CatalogTable` later when restoring the table read from metastore.
    
    Another concern is we previously don't restore `provider` for a view, please refer to https://github.com/apache/spark/blob/f18b905f6cace7686ef169fda7de474079d0af23/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L682. By this change, we will set `provider` to `HIVE_PROVIDER` too for view.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18824#discussion_r131048805
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -413,7 +414,10 @@ private[hive] class HiveClientImpl(
             unsupportedFeatures += "partitioned view"
           }
     
    -      val properties = Option(h.getParameters).map(_.asScala.toMap).orNull
    +      val properties = Option(h.getParameters).map(_.asScala.toMap).getOrElse(Map())
    +
    +      val provider = properties.get(HiveExternalCatalog.DATASOURCE_PROVIDER)
    +        .orElse(Some(DDLUtils.HIVE_PROVIDER))
    --- End diff --
    
    Previously we don't store provider for Hive serde table. Some existing logic to decide if a table retrieved from metastore is a datasource table may be broken due to this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18824#discussion_r131047358
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -616,15 +616,24 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         // Add table metadata such as table schema, partition columns, etc. to table properties.
         val updatedTable = withNewSchema.copy(
           properties = withNewSchema.properties ++ tableMetaToTableProps(withNewSchema))
    +
    +    // If it's a data source table, make sure the original schema is left unchanged; the
    +    // actual schema is recorded as a table property.
    +    val tableToStore = if (DDLUtils.isDatasourceTable(updatedTable)) {
    +      updatedTable.copy(schema = rawTable.schema)
    +    } else {
    +      updatedTable
    +    }
    +
         try {
    -      client.alterTable(updatedTable)
    +      client.alterTable(tableToStore)
         } catch {
           case NonFatal(e) =>
             val warningMessage =
               s"Could not alter schema of table  ${rawTable.identifier.quotedString} in a Hive " +
                 "compatible way. Updating Hive metastore in Spark SQL specific format."
             logWarning(warningMessage, e)
    -        client.alterTable(updatedTable.copy(schema = updatedTable.partitionSchema))
    +        client.alterTable(updatedTable.copy(schema = tableToStore.partitionSchema))
    --- End diff --
    
    I think this part is directly related to the logic which converts the table metadata to Spark SQL specific format: https://github.com/apache/spark/blob/f18b905f6cace7686ef169fda7de474079d0af23/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala#L290-L301
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin closed the pull request at:

    https://github.com/apache/spark/pull/18824


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    I reworked the patch to try to merge the "create table" and "alter table" paths, so they both do the translation the same way.
    
    There are still some test failures but I wanted to get this up here for you guys to take a look while I fix those.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    Hmm, after I made some changes to the test the whole test suite is failing (although running them individually works). I'll work on that but the fix itself, other than the test, should be correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18824#discussion_r131056819
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -616,15 +616,24 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         // Add table metadata such as table schema, partition columns, etc. to table properties.
         val updatedTable = withNewSchema.copy(
           properties = withNewSchema.properties ++ tableMetaToTableProps(withNewSchema))
    +
    +    // If it's a data source table, make sure the original schema is left unchanged; the
    +    // actual schema is recorded as a table property.
    +    val tableToStore = if (DDLUtils.isDatasourceTable(updatedTable)) {
    +      updatedTable.copy(schema = rawTable.schema)
    --- End diff --
    
    I just checked the JIRA description. This sounds a bug we need to resolve. It needs a little bit complex to fix it. We need to follow what we did for `create table`. cc @xwu0226 Please help @vanzin address this issue. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    **[Test build #80182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80182/testReport)** for PR 18824 at commit [`aae3abd`](https://github.com/apache/spark/commit/aae3abd673adc7ff939d842e49d566fa722403a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80182/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    **[Test build #80217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80217/testReport)** for PR 18824 at commit [`cc7cd95`](https://github.com/apache/spark/commit/cc7cd955e262e4b45730a6a909129b0c059b15ac).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18824#discussion_r131198143
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -413,7 +414,10 @@ private[hive] class HiveClientImpl(
             unsupportedFeatures += "partitioned view"
           }
     
    -      val properties = Option(h.getParameters).map(_.asScala.toMap).orNull
    +      val properties = Option(h.getParameters).map(_.asScala.toMap).getOrElse(Map())
    +
    +      val provider = properties.get(HiveExternalCatalog.DATASOURCE_PROVIDER)
    +        .orElse(Some(DDLUtils.HIVE_PROVIDER))
    --- End diff --
    
    > Maybe this is redundant.
    
    This was definitely not redundant in my testing. The metadata loaded from the metastore in `HiveExternalCatalog.alterTableSchema` was definitely not set when I debugged this. In fact the test I wrote fails if I remove this code (or comment the line that sets "provider" a few lines below).
    
    Perhaps some other part of the code sets it in a different code path, but this would make that part of the code redundant, not the other way around.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18824: [SPARK-21617][SQL] Store correct metadata in Hive...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18824#discussion_r131211342
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -413,7 +414,10 @@ private[hive] class HiveClientImpl(
             unsupportedFeatures += "partitioned view"
           }
     
    -      val properties = Option(h.getParameters).map(_.asScala.toMap).orNull
    +      val properties = Option(h.getParameters).map(_.asScala.toMap).getOrElse(Map())
    +
    +      val provider = properties.get(HiveExternalCatalog.DATASOURCE_PROVIDER)
    +        .orElse(Some(DDLUtils.HIVE_PROVIDER))
    --- End diff --
    
    The restoring you mention is done in `HiveExternalCatalog.restoreTableMetadata`. Let me see if I can use that instead of making this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    I updated the bug, let me close this for now while I figure out why that exception is happening.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18824: [SPARK-21617][SQL] Store correct metadata in Hive for al...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/18824
  
    FYI, I'm rebuilding the environment where I found the bug, to see why the code was failing even with the exception handler. I'll update the bug if necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org