You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by wangyum <gi...@git.apache.org> on 2018/08/11 15:57:58 UTC

[GitHub] spark pull request #22078: [SPARK-25085][SQL] Insert overwrite a non-partiti...

GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22078

    [SPARK-25085][SQL] Insert overwrite a non-partitioned table should not delete table folder

    ## What changes were proposed in this pull request?
    
    Insert overwrite a `non-partitioned` table should not delete table folder because it may contains information like ACL entries.
    This pr fix this issue, thus the data source table and the hive table have the same behavior.
    ## How was this patch tested?
    
    unit tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25085

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22078.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22078
    
----
commit 36a545ce96c8d43c59584f82c3e13854bd2a21ed
Author: Yuming Wang <yu...@...>
Date:   2018-08-11T15:39:35Z

    Fix SPARK-25085

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    **[Test build #94618 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94618/testReport)** for PR 22078 at commit [`36a545c`](https://github.com/apache/spark/commit/36a545ce96c8d43c59584f82c3e13854bd2a21ed).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    
    Number files | Delete directory elapsed time(milliseconds) | Truncate directory elapsed   time(milliseconds)
    -- | -- | --
    200 | 7 | 319
    5000 | 48 | 7810
    10000 | 90 | 16427
    100000 | 736 | 194470
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    cc @cloud-fan @dongjoon-hyun  @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2299/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94925/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    yes, thanks for the benchmark.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94692/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    **[Test build #94636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94636/testReport)** for PR 22078 at commit [`1b1158c`](https://github.com/apache/spark/commit/1b1158c594edd7fd0a10038095145950b8e45978).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    **[Test build #94924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94924/testReport)** for PR 22078 at commit [`bef1d17`](https://github.com/apache/spark/commit/bef1d17ced8865c3e95eaa451424e153b4b7214a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    **[Test build #94618 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94618/testReport)** for PR 22078 at commit [`36a545c`](https://github.com/apache/spark/commit/36a545ce96c8d43c59584f82c3e13854bd2a21ed).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94924/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22078: [SPARK-25085][SQL] Table subdirectories should in...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22078#discussion_r228882841
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -626,6 +626,14 @@ object SQLConf {
         .stringConf
         .createWithDefault("parquet")
     
    +  val DATA_SOURCE_TABLE_INHERIT_PERMS = buildConf("spark.sql.datasource.table.inherit.perms")
    +    .internal()
    +    .doc("Set this to true if the table directories should be inheriting the permission " +
    +      "of the warehouse or database directory " +
    --- End diff --
    
    Hive configuration doc itself doesn't looks quite clear to me as well actually.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    I think about how to improve performance.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94618/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22078: [SPARK-25085][SQL] Insert overwrite a non-partiti...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22078#discussion_r209940807
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala ---
    @@ -261,4 +273,67 @@ case class InsertIntoHadoopFsRelationCommand(
           }
         }.toMap
       }
    +
    +  private def isExtendedAclEnabled(hadoopConf: Configuration): Boolean =
    +    hadoopConf.getBoolean(DFS_NAMENODE_ACLS_ENABLED_KEY, DFS_NAMENODE_ACLS_ENABLED_DEFAULT)
    +
    +  private def getFullFileStatus(
    +      conf: SQLConf,
    +      hadoopConf: Configuration,
    +      fs: FileSystem,
    +      file: Path): (String, FsPermission, AclStatus) = {
    +    if (conf.isDataSouceTableInheritPerms && fs.exists(file)) {
    +      val fileStatus = fs.getFileStatus(file)
    +      val aclStatus = if (isExtendedAclEnabled(hadoopConf)) fs.getAclStatus(file) else null
    +      (fileStatus.getGroup, fileStatus.getPermission, aclStatus)
    +    } else {
    +      (null, null, null)
    +    }
    +  }
    +
    +  private def setFullFileStatus(
    +      hadoopConf: Configuration,
    +      group: String,
    +      permission: FsPermission,
    +      aclStatus: AclStatus,
    +      fs: FileSystem,
    +      target: Path): Unit = {
    +    try {
    +      // use FsShell to change group, permissions, and extended ACL's recursively
    +      val fsShell = new FsShell
    +      fsShell.setConf(hadoopConf)
    +      fsShell.run(Array[String]("-chgrp", "-R", group, target.toString))
    --- End diff --
    
    Well, for things like setting the group/user you can use `fs.setOwner` for instance


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22078: [SPARK-25085][SQL] Insert overwrite a non-partiti...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22078#discussion_r209954635
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -626,6 +626,14 @@ object SQLConf {
         .stringConf
         .createWithDefault("parquet")
     
    +  val DATA_SOURCE_TABLE_INHERIT_PERMS = buildConf("spark.sql.datasource.table.inherit.perms")
    +    .internal()
    +    .doc("Set this to true if the table directories should be inheriting the permission " +
    +      "of the warehouse or database directory " +
    --- End diff --
    
    Copy from `hive.warehouse.subdir.inherit.perms`: https://github.com/apache/hive/blob/rel/release-1.2.2/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L1729


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2300/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    **[Test build #94925 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94925/testReport)** for PR 22078 at commit [`f1574d5`](https://github.com/apache/spark/commit/f1574d5a3c4bfbd1a202154e69cff0dc81283e35).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    **[Test build #94636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94636/testReport)** for PR 22078 at commit [`1b1158c`](https://github.com/apache/spark/commit/1b1158c594edd7fd0a10038095145950b8e45978).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    **[Test build #94692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94692/testReport)** for PR 22078 at commit [`d5dea08`](https://github.com/apache/spark/commit/d5dea089bfc86a7776d53435cdfd2af6f7de0b6a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22078: [SPARK-25085][SQL] Insert overwrite a non-partiti...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22078#discussion_r209859644
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -626,6 +626,14 @@ object SQLConf {
         .stringConf
         .createWithDefault("parquet")
     
    +  val DATA_SOURCE_TABLE_INHERIT_PERMS = buildConf("spark.sql.datasource.table.inherit.perms")
    +    .internal()
    +    .doc("Set this to true if the table directories should be inheriting the permission " +
    +      "of the warehouse or database directory " +
    --- End diff --
    
    I think this is not very clear: it seems to me that the permissions are inherited from the base directory where the table is going to be written. On the other hand, instead, the permissions are inherited from the previous table directory which was deleted, right? Can we make this more clear?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22078: [SPARK-25085][SQL] Table subdirectories should in...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22078#discussion_r228881996
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala ---
    @@ -70,7 +76,6 @@ case class InsertIntoHadoopFsRelationCommand(
         val hadoopConf = sparkSession.sessionState.newHadoopConfWithOptions(options)
         val fs = outputPath.getFileSystem(hadoopConf)
         val qualifiedOutputPath = outputPath.makeQualified(fs.getUri, fs.getWorkingDirectory)
    -
    --- End diff --
    
    I would remove unrelated newline changes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2092/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    **[Test build #94924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94924/testReport)** for PR 22078 at commit [`bef1d17`](https://github.com/apache/spark/commit/bef1d17ced8865c3e95eaa451424e153b4b7214a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22078: [SPARK-25085][SQL] Table subdirectories should in...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22078#discussion_r226875401
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala ---
    @@ -257,7 +257,102 @@ class HiveCatalogedDDLSuite extends DDLSuite with TestHiveSingleton with BeforeA
             spark.sql("ALTER TABLE t1 ADD COLUMNS (newcol1 STRUCT<`$col1`:STRING, col2:Int>)")
           }.getMessage
           assert(err.contains("Cannot recognize hive type string:"))
    -   }
    +    }
    +  }
    +
    +  test("Normal table should inherit database permission") {
    --- End diff --
    
    @wangyum . This inherits the parent directory properties and the property name is `DATA_SOURCE_TABLE_INHERIT_PERMS`. Hence, it's not Hive-catalog only stuff.
    
    It would be great if we can have this test coverage on `InMemoryCatalogedDDLSuite`, not only `HiveCatalogedDDLSuite`. We need some refactoring and the following will be the only difference.
    ```scala
    spark.sql("create table datasource_table(a int) using parquet")
    spark.sql("create table hive_table(a int)")
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22078: [SPARK-25085][SQL] Insert overwrite a non-partiti...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22078#discussion_r209859205
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala ---
    @@ -261,4 +273,67 @@ case class InsertIntoHadoopFsRelationCommand(
           }
         }.toMap
       }
    +
    +  private def isExtendedAclEnabled(hadoopConf: Configuration): Boolean =
    +    hadoopConf.getBoolean(DFS_NAMENODE_ACLS_ENABLED_KEY, DFS_NAMENODE_ACLS_ENABLED_DEFAULT)
    +
    +  private def getFullFileStatus(
    +      conf: SQLConf,
    +      hadoopConf: Configuration,
    +      fs: FileSystem,
    +      file: Path): (String, FsPermission, AclStatus) = {
    +    if (conf.isDataSouceTableInheritPerms && fs.exists(file)) {
    +      val fileStatus = fs.getFileStatus(file)
    +      val aclStatus = if (isExtendedAclEnabled(hadoopConf)) fs.getAclStatus(file) else null
    +      (fileStatus.getGroup, fileStatus.getPermission, aclStatus)
    +    } else {
    +      (null, null, null)
    +    }
    +  }
    +
    +  private def setFullFileStatus(
    +      hadoopConf: Configuration,
    +      group: String,
    +      permission: FsPermission,
    +      aclStatus: AclStatus,
    +      fs: FileSystem,
    +      target: Path): Unit = {
    +    try {
    +      // use FsShell to change group, permissions, and extended ACL's recursively
    +      val fsShell = new FsShell
    +      fsShell.setConf(hadoopConf)
    +      fsShell.run(Array[String]("-chgrp", "-R", group, target.toString))
    --- End diff --
    
    can we avoid using the shell ad rather use the programmatic API? eg. `fs.setOwner`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22078: [SPARK-25085][SQL] Insert overwrite a non-partiti...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22078#discussion_r209430770
  
    --- Diff: core/src/main/scala/org/apache/spark/internal/io/FileCommitProtocol.scala ---
    @@ -124,6 +124,25 @@ abstract class FileCommitProtocol {
         fs.delete(path, recursive)
       }
     
    +  /**
    +   * Specifies that a directory should be truncated with the commit of this job. The default
    +   * implementation deletes the file immediately.
    +   *
    +   * Return true if this directory is empty or deleted all files in this directory, otherwise false.
    +   */
    +  def truncateDirectoryWithJob(fs: FileSystem, directory: Path, recursive: Boolean): Boolean = {
    +    assert(fs.isDirectory(directory))
    +    val listStatus = fs.listStatus(directory)
    +    if (listStatus.isEmpty) {
    +      true
    +    } else {
    +      val result = listStatus.map { f =>
    --- End diff --
    
    what about `forall`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22078: [SPARK-25085][SQL] Insert overwrite a non-partiti...

Posted by wangyum <gi...@git.apache.org>.
Github user wangyum commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22078#discussion_r209934062
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala ---
    @@ -261,4 +273,67 @@ case class InsertIntoHadoopFsRelationCommand(
           }
         }.toMap
       }
    +
    +  private def isExtendedAclEnabled(hadoopConf: Configuration): Boolean =
    +    hadoopConf.getBoolean(DFS_NAMENODE_ACLS_ENABLED_KEY, DFS_NAMENODE_ACLS_ENABLED_DEFAULT)
    +
    +  private def getFullFileStatus(
    +      conf: SQLConf,
    +      hadoopConf: Configuration,
    +      fs: FileSystem,
    +      file: Path): (String, FsPermission, AclStatus) = {
    +    if (conf.isDataSouceTableInheritPerms && fs.exists(file)) {
    +      val fileStatus = fs.getFileStatus(file)
    +      val aclStatus = if (isExtendedAclEnabled(hadoopConf)) fs.getAclStatus(file) else null
    +      (fileStatus.getGroup, fileStatus.getPermission, aclStatus)
    +    } else {
    +      (null, null, null)
    +    }
    +  }
    +
    +  private def setFullFileStatus(
    +      hadoopConf: Configuration,
    +      group: String,
    +      permission: FsPermission,
    +      aclStatus: AclStatus,
    +      fs: FileSystem,
    +      target: Path): Unit = {
    +    try {
    +      // use FsShell to change group, permissions, and extended ACL's recursively
    +      val fsShell = new FsShell
    +      fsShell.setConf(hadoopConf)
    +      fsShell.run(Array[String]("-chgrp", "-R", group, target.toString))
    --- End diff --
    
    These code copy from https://github.com/apache/hive/blob/rel/release-1.2.2/shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L722-L767
    
    I will think about whether there is a better implementation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/94636/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    I am just worried/concerned by the performance regression this might introduce in the case the table is made of many files. Can we have a benchmark using different file system (at least HDFS, S3) with hundreds and thousands of files? Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22078: [SPARK-25085][SQL] Table subdirectories should in...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22078#discussion_r228881824
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelationCommand.scala ---
    @@ -261,4 +272,69 @@ case class InsertIntoHadoopFsRelationCommand(
           }
         }.toMap
       }
    +
    +  private def isExtendedAclEnabled(hadoopConf: Configuration): Boolean =
    +    hadoopConf.getBoolean(DFS_NAMENODE_ACLS_ENABLED_KEY, DFS_NAMENODE_ACLS_ENABLED_DEFAULT)
    +
    +  private def getFullFileStatus(
    +      conf: SQLConf,
    +      hadoopConf: Configuration,
    +      pathExists: Boolean,
    +      fs: FileSystem,
    +      file: Path): (Option[FsPermission], Option[AclStatus]) = {
    +    var permission: Option[FsPermission] = None
    +    var aclStatus: Option[AclStatus] = None
    +    if (conf.isDataSouceTableInheritPerms && pathExists) {
    +      permission = Some(fs.getFileStatus(file).getPermission)
    +      if (isExtendedAclEnabled(hadoopConf)) aclStatus = Some(fs.getAclStatus(file))
    +    }
    +    (permission, aclStatus)
    --- End diff --
    
    I would remove `var` by, for instance:
    
    ```scala
    val shouldFetchPermission = conf.isDataSouceTableInheritPerms && pathExists
    val shouldFetchAclStatus = shouldFetchPermission && isExtendedAclEnabled(hadoopConf)
    
    val permission = if (shouldFetchPermission) {
      Some(fs.getFileStatus(file).getPermission)
    } else {
      None
    }
    val aclStatus = if (shouldFetchAclStatus) {
      Some(fs.getAclStatus(file))
    } else {
      None
    }
    
    (permission, aclStatus)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22078: [SPARK-25085][SQL] Insert overwrite a non-partiti...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22078#discussion_r209957030
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
    @@ -626,6 +626,14 @@ object SQLConf {
         .stringConf
         .createWithDefault("parquet")
     
    +  val DATA_SOURCE_TABLE_INHERIT_PERMS = buildConf("spark.sql.datasource.table.inherit.perms")
    +    .internal()
    +    .doc("Set this to true if the table directories should be inheriting the permission " +
    +      "of the warehouse or database directory " +
    --- End diff --
    
    yes, but that does a different thing IIUC, ie. that copies the permissions from the warehouse dir for the new tables created there. Here we are not doing the same, we are copying the permissions from the previous table dir to the new table dir after the overwrite.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2082/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    **[Test build #94692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94692/testReport)** for PR 22078 at commit [`d5dea08`](https://github.com/apache/spark/commit/d5dea089bfc86a7776d53435cdfd2af6f7de0b6a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2137/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22078: [SPARK-25085][SQL] Insert overwrite a non-partitioned ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22078
  
    **[Test build #94925 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94925/testReport)** for PR 22078 at commit [`f1574d5`](https://github.com/apache/spark/commit/f1574d5a3c4bfbd1a202154e69cff0dc81283e35).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org