You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ueshin <gi...@git.apache.org> on 2018/09/13 05:27:57 UTC

[GitHub] spark pull request #22410: [SPARK-25418][SQL] The metadata of DataSource tab...

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/22410

    [SPARK-25418][SQL] The metadata of DataSource table should not include Hive-generated storage properties.

    ## What changes were proposed in this pull request?
    
    When Hive support enabled, Hive catalog puts extra storage properties into table metadata even for DataSource tables, but we should not have them.
    
    ## How was this patch tested?
    
    Modified a test.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-25418/hive_metadata

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22410.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22410
    
----
commit 2484ca61408060f0559f2237327515246b3d92c1
Author: Takuya UESHIN <ue...@...>
Date:   2018-09-13T03:06:31Z

    Remove Hive-generated storage properties.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    **[Test build #96023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96023/testReport)** for PR 22410 at commit [`2484ca6`](https://github.com/apache/spark/commit/2484ca61408060f0559f2237327515246b3d92c1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    cc @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22410: [SPARK-25418][SQL] The metadata of DataSource tab...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22410#discussion_r217592782
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -1309,6 +1312,8 @@ object HiveExternalCatalog {
     
       val CREATED_SPARK_VERSION = SPARK_SQL_PREFIX + "create.version"
     
    +  val HIVE_GENERATED_STORAGE_PROPERTIES = Set(SERIALIZATION_FORMAT)
    --- End diff --
    
    Actually the hive-generated storage property I think we should exclude for now is only this one, but we might have some more in the future, so I'd say "properties" and we will add them to this set in the case. WDYT?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3077/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by dilipbiswal <gi...@git.apache.org>.
Github user dilipbiswal commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22410: [SPARK-25418][SQL] The metadata of DataSource tab...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22410#discussion_r217603253
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -1309,6 +1312,8 @@ object HiveExternalCatalog {
     
       val CREATED_SPARK_VERSION = SPARK_SQL_PREFIX + "create.version"
     
    +  val HIVE_GENERATED_STORAGE_PROPERTIES = Set(SERIALIZATION_FORMAT)
    --- End diff --
    
    We can add more in the future. Basically, these properties are useless to Spark data source tables. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22410: [SPARK-25418][SQL] The metadata of DataSource tab...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22410#discussion_r217669057
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -806,6 +807,8 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
           updateLocationInStorageProps(table, newPath = None).copy(
             locationUri = tableLocation.map(CatalogUtils.stringToURI(_)))
         }
    +    val storageWithoutHiveGeneratedProperties = storageWithLocation.copy(
    +      properties = storageWithLocation.properties.filterKeys(!HIVE_GENERATED_STORAGE_PROPERTIES(_)))
    --- End diff --
    
    Shall we do this in `HiveClientImpl`? IIRC we filter out some table props there. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96023/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3074/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    LGTM
    
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    **[Test build #96026 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96026/testReport)** for PR 22410 at commit [`2484ca6`](https://github.com/apache/spark/commit/2484ca61408060f0559f2237327515246b3d92c1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22410: [SPARK-25418][SQL] The metadata of DataSource tab...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22410#discussion_r217672194
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -806,6 +807,8 @@ private[spark] class HiveExternalCatalog(conf: SparkConf, hadoopConf: Configurat
           updateLocationInStorageProps(table, newPath = None).copy(
             locationUri = tableLocation.map(CatalogUtils.stringToURI(_)))
         }
    +    val storageWithoutHiveGeneratedProperties = storageWithLocation.copy(
    +      properties = storageWithLocation.properties.filterKeys(!HIVE_GENERATED_STORAGE_PROPERTIES(_)))
    --- End diff --
    
    In `HiveClientImpl`, we don't know the table is Hive table or DataSource table yet. Can we remove the props even for Hive tables?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22410: [SPARK-25418][SQL] The metadata of DataSource tab...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22410#discussion_r217623551
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -1309,6 +1312,8 @@ object HiveExternalCatalog {
     
       val CREATED_SPARK_VERSION = SPARK_SQL_PREFIX + "create.version"
     
    +  val HIVE_GENERATED_STORAGE_PROPERTIES = Set(SERIALIZATION_FORMAT)
    --- End diff --
    
    I got it, @ueshin and @gatorsmile .


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    **[Test build #96023 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96023/testReport)** for PR 22410 at commit [`2484ca6`](https://github.com/apache/spark/commit/2484ca61408060f0559f2237327515246b3d92c1).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22410: [SPARK-25418][SQL] The metadata of DataSource tab...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22410#discussion_r217590735
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
    @@ -1309,6 +1312,8 @@ object HiveExternalCatalog {
     
       val CREATED_SPARK_VERSION = SPARK_SQL_PREFIX + "create.version"
     
    +  val HIVE_GENERATED_STORAGE_PROPERTIES = Set(SERIALIZATION_FORMAT)
    --- End diff --
    
    @ueshin . The title means `Hive-generated storage properties`, but this PR excludes only this one. Could you add more? Othewise, can we make this as a SQLConf in order to be configurable?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    **[Test build #96026 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96026/testReport)** for PR 22410 at commit [`2484ca6`](https://github.com/apache/spark/commit/2484ca61408060f0559f2237327515246b3d92c1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96026/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22410: [SPARK-25418][SQL] The metadata of DataSource table shou...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22410
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22410: [SPARK-25418][SQL] The metadata of DataSource tab...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22410


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org