You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2017/10/31 15:03:57 UTC

[GitHub] spark pull request #19622: [SPARK-22306][SQL] alter table schema should not ...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/19622

    [SPARK-22306][SQL] alter table schema should not erase the bucketing metadata at hive side

    ## What changes were proposed in this pull request?
    
    When we alter table schema, we set the new schema to spark `CatalogTable`, convert it to hive table, and finally call `hive.alterTable`. This causes a problem in Spark 2.2, because hive bucketing metedata is not recognized by Spark, which means a Spark `CatalogTable` representing a hive table is always non-bucketed, and when we convert it to hive table and call `hive.alterTable`, the bucketing metadata will be removed.
    
    To fix this bug, we should read out the raw hive table metadata, update its schema, and call `hive.alterTable`. By doing this we can guarantee only the schema is changed, and nothing else.
    
    Note that this bug doesn't exist in the master branch, because we've added hive bucketing support and the hive bucketing metadata can be recognized by Spark. I think we should merge this PR to master too, for code cleanup and reduce the difference between master and 2.2 branch for backporting.
    
    ## How was this patch tested?
    
    new regression test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark infer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19622.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19622
    
----
commit df567cdcaee9483a6c70751be7e0c17366c94fc8
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-10-31T14:54:34Z

    alter table schema should not erase the bucketing metadata at hive side

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    thanks, merging to 2.2!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19622#discussion_r148441747
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -295,7 +297,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             storage = table.storage.copy(
               locationUri = None,
               properties = storagePropsWithLocation),
    -        schema = table.partitionSchema,
    +        schema = StructType(EMPTY_DATA_SCHEMA ++ table.partitionSchema),
    --- End diff --
    
    I think this should be good for 2.3, but how about keeping it unchanged in 2.2 for safety? I am just afraid this might break the other library who made an assumption in our metastore.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19622#discussion_r148067066
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -837,20 +849,7 @@ private[hive] object HiveClientImpl {
         val (partCols, schema) = table.schema.map(toHiveColumn).partition { c =>
           table.partitionColumnNames.contains(c.getName)
         }
    -    // after SPARK-19279, it is not allowed to create a hive table with an empty schema,
    -    // so here we should not add a default col schema
    -    if (schema.isEmpty && HiveExternalCatalog.isDatasourceTable(table)) {
    -      // This is a hack to preserve existing behavior. Before Spark 2.0, we do not
    -      // set a default serde here (this was done in Hive), and so if the user provides
    -      // an empty schema Hive would automatically populate the schema with a single
    -      // field "col". However, after SPARK-14388, we set the default serde to
    -      // LazySimpleSerde so this implicit behavior no longer happens. Therefore,
    -      // we need to do it in Spark ourselves.
    -      hiveTable.setFields(
    -        Seq(new FieldSchema("col", "array<string>", "from deserializer")).asJava)
    -    } else {
    -      hiveTable.setFields(schema.asJava)
    -    }
    +    hiveTable.setFields(schema.asJava)
         hiveTable.setPartCols(partCols.asJava)
         userName.foreach(hiveTable.setOwner)
    --- End diff --
    
    @cloud-fan . `owner` seems to be changed here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    **[Test build #83267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83267/testReport)** for PR 19622 at commit [`7c9c18e`](https://github.com/apache/spark/commit/7c9c18e2afe35ca16408eab73df1d266ef3b2757).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19622#discussion_r148064090
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -246,11 +246,11 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log
     
           inferredSchema match {
             case Some(dataSchema) =>
    -          val schema = StructType(dataSchema ++ relation.tableMeta.partitionSchema)
               if (inferenceMode == INFER_AND_SAVE) {
    -            updateCatalogSchema(relation.tableMeta.identifier, schema)
    +            updateDataSchema(relation.tableMeta.identifier, dataSchema)
    --- End diff --
    
    owner is in table properties?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83278/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19622#discussion_r148170423
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -295,7 +297,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             storage = table.storage.copy(
               locationUri = None,
               properties = storagePropsWithLocation),
    -        schema = table.partitionSchema,
    +        schema = StructType(EMPTY_DATA_SCHEMA ++ table.partitionSchema),
    --- End diff --
    
    Don't we need to add the empty data schema in `newHiveCompatibleMetastoreTable`? In existing `HiveClientImpl`, if two conditions are met `schema.isEmpty` and `HiveExternalCatalog.isDatasourceTable(table)`, we add the empty schema then.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan closed the pull request at:

    https://github.com/apache/spark/pull/19622


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    **[Test build #83264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83264/testReport)** for PR 19622 at commit [`df567cd`](https://github.com/apache/spark/commit/df567cdcaee9483a6c70751be7e0c17366c94fc8).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83287/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83264/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19622#discussion_r148060497
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
    @@ -246,11 +246,11 @@ private[hive] class HiveMetastoreCatalog(sparkSession: SparkSession) extends Log
     
           inferredSchema match {
             case Some(dataSchema) =>
    -          val schema = StructType(dataSchema ++ relation.tableMeta.partitionSchema)
               if (inferenceMode == INFER_AND_SAVE) {
    -            updateCatalogSchema(relation.tableMeta.identifier, schema)
    +            updateDataSchema(relation.tableMeta.identifier, dataSchema)
    --- End diff --
    
    Hi, @cloud-fan . 
    SPARK-22306 also reports the unexpected change of table `owner`. To resolve SPARK-22306, we need to keep the original `owner` during `INFER_AND_SAVE`. It seems to be the only remaining thing here. Could you consider that in this PR please, too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    **[Test build #83278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83278/testReport)** for PR 19622 at commit [`ca89971`](https://github.com/apache/spark/commit/ca89971dd3264e3739ffceb16c34eb155e614e0c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    **[Test build #83325 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83325/testReport)** for PR 19622 at commit [`1309448`](https://github.com/apache/spark/commit/1309448f122de94c2d89808e06e892cd5a374028).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19622#discussion_r148168917
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala ---
    @@ -148,17 +148,15 @@ abstract class ExternalCatalog
       def alterTable(tableDefinition: CatalogTable): Unit
     
       /**
    -   * Alter the schema of a table identified by the provided database and table name. The new schema
    -   * should still contain the existing bucket columns and partition columns used by the table. This
    -   * method will also update any Spark SQL-related parameters stored as Hive table properties (such
    -   * as the schema itself).
    +   * Alter the data schema of a table identified by the provided database and table name. The new
    +   * data schema should not have conflict column names with the existing partition columns, and
    +   * should still contain all the existing data columns.
        *
        * @param db Database that table to alter schema for exists in
        * @param table Name of table to alter schema for
    -   * @param schema Updated schema to be used for the table (must contain existing partition and
    -   *               bucket columns)
    +   * @param newDataSchema Updated data schema to be used for the table.
        */
    -  def alterTableSchema(db: String, table: String, schema: StructType): Unit
    +  def alterTableDataSchema(db: String, table: String, newDataSchema: StructType): Unit
    --- End diff --
    
    Looks like we don't support dropping columns yet. New description is more clear.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    **[Test build #83287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83287/testReport)** for PR 19622 at commit [`e836b2f`](https://github.com/apache/spark/commit/e836b2fb0d9758e6191564682bfdefdbbeddd66a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    **[Test build #83267 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83267/testReport)** for PR 19622 at commit [`7c9c18e`](https://github.com/apache/spark/commit/7c9c18e2afe35ca16408eab73df1d266ef3b2757).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19622#discussion_r148168129
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala ---
    @@ -148,17 +148,15 @@ abstract class ExternalCatalog
       def alterTable(tableDefinition: CatalogTable): Unit
     
       /**
    -   * Alter the schema of a table identified by the provided database and table name. The new schema
    -   * should still contain the existing bucket columns and partition columns used by the table. This
    -   * method will also update any Spark SQL-related parameters stored as Hive table properties (such
    -   * as the schema itself).
    +   * Alter the data schema of a table identified by the provided database and table name. The new
    +   * data schema should not have conflict column names with the existing partition columns, and
    +   * should still contain all the existing data columns.
        *
        * @param db Database that table to alter schema for exists in
        * @param table Name of table to alter schema for
    -   * @param schema Updated schema to be used for the table (must contain existing partition and
    -   *               bucket columns)
    +   * @param newDataSchema Updated data schema to be used for the table.
        */
    -  def alterTableSchema(db: String, table: String, schema: StructType): Unit
    +  def alterTableDataSchema(db: String, table: String, newDataSchema: StructType): Unit
    --- End diff --
    
    From the old description, looks `alterTableSchema` has the capacity to delete existing data columns. But new description clarifies it can't (`... should still contain all the existing data columns.`). Is this we want?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    **[Test build #83287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83287/testReport)** for PR 19622 at commit [`e836b2f`](https://github.com/apache/spark/commit/e836b2fb0d9758e6191564682bfdefdbbeddd66a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19622#discussion_r148171454
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -630,32 +632,33 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
         }
       }
     
    -  override def alterTableSchema(db: String, table: String, schema: StructType): Unit = withClient {
    +  override def alterTableDataSchema(
    +      db: String, table: String, newDataSchema: StructType): Unit = withClient {
         requireTableExists(db, table)
    -    val rawTable = getRawTable(db, table)
    -    // Add table metadata such as table schema, partition columns, etc. to table properties.
    -    val updatedProperties = rawTable.properties ++ tableMetaToTableProps(rawTable, schema)
    -    val withNewSchema = rawTable.copy(properties = updatedProperties, schema = schema)
    -    verifyColumnNames(withNewSchema)
    +    val oldTable = getTable(db, table)
    +    verifyDataSchema(oldTable.identifier, oldTable.tableType, newDataSchema)
    +
    +    val newProps = oldTable.properties.filterNot(_._1.startsWith(DATASOURCE_SCHEMA)) ++
    +      tableMetaToTableProps(oldTable, StructType(newDataSchema ++ oldTable.partitionSchema))
     
    -    if (isDatasourceTable(rawTable)) {
    +    if (isDatasourceTable(oldTable)) {
           // For data source tables, first try to write it with the schema set; if that does not work,
           // try again with updated properties and the partition schema. This is a simplified version of
           // what createDataSourceTable() does, and may leave the table in a state unreadable by Hive
           // (for example, the schema does not match the data source schema, or does not match the
           // storage descriptor).
           try {
    -        client.alterTable(withNewSchema)
    +        client.alterTableDataSchemaAndProps(db, table, newDataSchema, newProps)
           } catch {
             case NonFatal(e) =>
               val warningMessage =
    -            s"Could not alter schema of table  ${rawTable.identifier.quotedString} in a Hive " +
    +            s"Could not alter schema of table  ${oldTable.identifier.quotedString} in a Hive " +
    --- End diff --
    
    nit: an extra space before `${oldTable.identifier.quotedString}`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    LGTM except a comment [here](https://github.com/apache/spark/pull/19622#discussion_r148441747)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    **[Test build #83264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83264/testReport)** for PR 19622 at commit [`df567cd`](https://github.com/apache/spark/commit/df567cdcaee9483a6c70751be7e0c17366c94fc8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    **[Test build #83325 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83325/testReport)** for PR 19622 at commit [`1309448`](https://github.com/apache/spark/commit/1309448f122de94c2d89808e06e892cd5a374028).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83267/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83325/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19622#discussion_r148028324
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala ---
    @@ -369,7 +367,7 @@ class SessionCatalog(
         }
     
         // assuming the newSchema has all partition columns at the end as required
    --- End diff --
    
    We should remove this comment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL] alter table schema should not erase t...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    cc @gatorsmile @dongjoon-hyun @viirya 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19622: [SPARK-22306][SQL][2.2] alter table schema should not er...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19622
  
    **[Test build #83278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83278/testReport)** for PR 19622 at commit [`ca89971`](https://github.com/apache/spark/commit/ca89971dd3264e3739ffceb16c34eb155e614e0c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19622#discussion_r148145801
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala ---
    @@ -837,20 +849,7 @@ private[hive] object HiveClientImpl {
         val (partCols, schema) = table.schema.map(toHiveColumn).partition { c =>
           table.partitionColumnNames.contains(c.getName)
         }
    -    // after SPARK-19279, it is not allowed to create a hive table with an empty schema,
    -    // so here we should not add a default col schema
    -    if (schema.isEmpty && HiveExternalCatalog.isDatasourceTable(table)) {
    -      // This is a hack to preserve existing behavior. Before Spark 2.0, we do not
    -      // set a default serde here (this was done in Hive), and so if the user provides
    -      // an empty schema Hive would automatically populate the schema with a single
    -      // field "col". However, after SPARK-14388, we set the default serde to
    -      // LazySimpleSerde so this implicit behavior no longer happens. Therefore,
    -      // we need to do it in Spark ourselves.
    -      hiveTable.setFields(
    -        Seq(new FieldSchema("col", "array<string>", "from deserializer")).asJava)
    -    } else {
    -      hiveTable.setFields(schema.asJava)
    -    }
    +    hiveTable.setFields(schema.asJava)
         hiveTable.setPartCols(partCols.asJava)
         userName.foreach(hiveTable.setOwner)
    --- End diff --
    
    We won't hit this branch for `alterSchema`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19622: [SPARK-22306][SQL][2.2] alter table schema should...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19622#discussion_r148219682
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -295,7 +297,7 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
             storage = table.storage.copy(
               locationUri = None,
               properties = storagePropsWithLocation),
    -        schema = table.partitionSchema,
    +        schema = StructType(EMPTY_DATA_SCHEMA ++ table.partitionSchema),
    --- End diff --
    
    according to https://issues.apache.org/jira/browse/SPARK-19279 , I think this new behavior is more reasonable, because empty schema table is problematic to hive and thus not Hive-compatible.  cc @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org