You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zuotingbing <gi...@git.apache.org> on 2017/05/04 07:25:49 UTC

[GitHub] spark pull request #17858: [SPARK-20594]The staging directory should be appe...

GitHub user zuotingbing opened a pull request:

    https://github.com/apache/spark/pull/17858

    [SPARK-20594]The staging directory should be appended with ".hive-staging" to avoid being deleted if we set hive.exec.stagingdir under the table directory without start with "."

    JIRA Issue: https://issues.apache.org/jira/browse/SPARK-20594
    
    ## What changes were proposed in this pull request?
    
    The staging directory should be appended with ".hive-staging" to avoid being deleted if we set hive.exec.stagingdir under the table directory without start with "."
    
    ## How was this patch tested?
    
    Existing tests and manual tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zuotingbing/spark spark-stagingdir

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17858.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17858
    
----
commit c154f3a8e9d33270e73678764b1f3dc384648e1b
Author: zuotingbing <zu...@zte.com.cn>
Date:   2017-05-04T07:11:12Z

    [SPARK-20594]The staging directory should be appended with ".hive-staging" to avoid being deleted if we set hive.exec.stagingdir under the table directory without start with "."

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115188063
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -222,7 +222,7 @@ case class InsertIntoHiveTable(
         val externalCatalog = sparkSession.sharedState.externalCatalog
         val hiveVersion = externalCatalog.asInstanceOf[HiveExternalCatalog].client.version
         val hadoopConf = sessionState.newHadoopConf()
    -    val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
    +    val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging") + "/.hive-staging"
    --- End diff --
    
    How about?
    
    ```Scala
        // SPARK-20594 After Hive 1.1, Hive will create the staging directory under the table directory,
        // and when moving staging directory to table directory, Hive will still empty the table
        // directory, but will exclude the staging directory, which start with "." or "_".
        val stagingDir =
          new Path(hadoopConf.get("hive.exec.stagingdir", ".hive-staging"), ".hive-staging").toString
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115929103
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,34 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  test(
    +    """SPARK-20594: This is a walk-around fix to resolve a Hive bug. Hive requires that the
    +      |staging directory needs to avoid being deleted when users set hive.exec.stagingdir
    +      |under the table directory.""".stripMargin) {
    --- End diff --
    
    Without your fix, this test case still passes. Could you please check it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115640830
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,23 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: The staging directory should be a child directory starts with "." to avoid
    +    // being deleted if we set hive.exec.stagingdir under the table directory.
    +    if (FileUtils.isSubDir(new Path(stagingPathName), inputPath, fs)
    +      && !stagingPathName.stripPrefix(inputPathName).stripPrefix(File.separator).startsWith(".")) {
    --- End diff --
    
    The dir is the staging dir which seems to me should be deleted exactly. 
    BTW, if we `set hive.exec.stagingdir=/test/a/b` , the dir `/test/a` will also not be deleted by the logic before.
    Should i need to delete the parent directory only for this patch?Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76765/testReport)** for PR 17858 at commit [`bf1b4ec`](https://github.com/apache/spark/commit/bf1b4eca9b9692d2dd11a5c166446cc2d5258e86).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115856955
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,34 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  test(
    +    """SPARK-20594: This is a walk-around fix to resolve a Hive bug. Hive requires that the
    +      |staging directory needs to avoid being deleted when users set hive.exec.stagingdir
    +      |under the table directory.""".stripMargin) {
    +
    +    withTable("test_table", "test_table1") {
    +      spark.range(1).write.saveAsTable("test_table")
    +
    +      // Make sure the table has also been updated.
    +      checkAnswer(
    +        sql("SELECT * FROM test_table"),
    +        Row(0)
    +      )
    +
    +      sql("CREATE TABLE test_table1 (key int)")
    +
    +      // Set hive.exec.stagingdir under the table directory without start with ".".
    +      sql("set hive.exec.stagingdir=./test")
    --- End diff --
    
    Could you set it back after this test case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115637682
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,23 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: The staging directory should be a child directory starts with "." to avoid
    +    // being deleted if we set hive.exec.stagingdir under the table directory.
    +    if (FileUtils.isSubDir(new Path(stagingPathName), inputPath, fs)
    +      && !stagingPathName.stripPrefix(inputPathName).stripPrefix(File.separator).startsWith(".")) {
    --- End diff --
    
    The parent directory is the table directory in this case. why should we delete it?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76839/testReport)** for PR 17858 at commit [`639d63a`](https://github.com/apache/spark/commit/639d63a20e94523f3443bc83b272fc60c1f5627a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115587051
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,23 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: The staging directory should be a child directory starts with "." to avoid
    +    // being deleted if we set hive.exec.stagingdir under the table directory.
    +    if (FileUtils.isSubDir(new Path(stagingPathName), inputPath, fs)
    +      && !stagingPathName.stripPrefix(inputPathName).stripPrefix(File.separator).startsWith(".")) {
    --- End diff --
    
    fs.deleteOnExit(dir) deletes dir, but the parent directory is still there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Do you really need to force this? or, is it just that any path relative to the output dir has to be a hidden directory starting with "." or "_"? For example, right now this prevents me from making the staging dir "/foo/bar" but I don't see a reason to disallow that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76765/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    @gatorsmile it seems my mistake, i will try to fix this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76617/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115973153
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,34 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  test(
    +    """SPARK-20594: This is a walk-around fix to resolve a Hive bug. Hive requires that the
    +      |staging directory needs to avoid being deleted when users set hive.exec.stagingdir
    +      |under the table directory.""".stripMargin) {
    --- End diff --
    
    Hmm, we must create table rather than simplify by `spark.range(1).write.saveAsTable("test_table")`. Thanks again. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115671172
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,23 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: The staging directory should be a child directory starts with "." to avoid
    +    // being deleted if we set hive.exec.stagingdir under the table directory.
    +    if (FileUtils.isSubDir(new Path(stagingPathName), inputPath, fs)
    +      && !stagingPathName.stripPrefix(inputPathName).stripPrefix(File.separator).startsWith(".")) {
    --- End diff --
    
    Ok. This fix does not make things worse. I can accept it. cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115671640
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,45 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  private def dropTables(tableNames: String*): Unit = {
    +    tableNames.foreach { name =>
    +      sql(s"DROP TABLE IF EXISTS $name")
    +    }
    +  }
    +
    +  test(
    +    """SPARK-20594: The staging directory should be appended with ".hive-staging"
    +      |to avoid being deleted if we set hive.exec.stagingdir under the table directory
    +      |without start with "."""".stripMargin) {
    +
    +    dropTables("test_table", "test_table1")
    --- End diff --
    
    ```Scala
     withTable("test_table", "test_table1") {
        ...
      }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    In this case, Hive will create the staging directory under the table directory, and when moving staging directory to table directory, Hive will still empty the table directory, but will exclude the staging directory which start with "." or "_"
    
    `public static final PathFilter HIDDEN_FILES_PATH_FILTER = new PathFilter() {
        public boolean accept(Path p) {
          String name = p.getName();
          return !name.startsWith("_") && !name.startsWith(".");
        }
      };`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76566 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76566/testReport)** for PR 17858 at commit [`de938ed`](https://github.com/apache/spark/commit/de938ed91f6e62b02fed32c7123f3aae5d51d9f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115671752
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,45 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  private def dropTables(tableNames: String*): Unit = {
    +    tableNames.foreach { name =>
    +      sql(s"DROP TABLE IF EXISTS $name")
    +    }
    +  }
    --- End diff --
    
    This is not needed. You can call `withTable`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115879552
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,24 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: This is a walk-around fix to resolve a Hive bug. Hive requires that the
    +    // staging directory needs to avoid being deleted when users set hive.exec.stagingdir
    +    // under the table directory.
    +    if (FileUtils.isSubDir(new Path(stagingPathName), inputPath, fs)
    +      && !stagingPathName.stripPrefix(inputPathName).stripPrefix(File.separator).startsWith(".")) {
    +      logDebug(s"The staging dir '$stagingPathName' should be a child directory starts " +
    +        s"with '.' to avoid being deleted if we set hive.exec.stagingdir under the table " +
    +        s"directory.")
    --- End diff --
    
    Nit: please remove the above TWO string Interpolation `s`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76644 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76644/testReport)** for PR 17858 at commit [`6b1b153`](https://github.com/apache/spark/commit/6b1b153e1ee9ec3e7830158d8f8eb274970929ae).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76806/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76744 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76744/testReport)** for PR 17858 at commit [`9f41436`](https://github.com/apache/spark/commit/9f414366c67e4fd6d46b4ec4d241de73e0921f86).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17858


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115672879
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,23 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: The staging directory should be a child directory starts with "." to avoid
    +    // being deleted if we set hive.exec.stagingdir under the table directory.
    --- End diff --
    
    ok, i will update it . Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115893632
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,34 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  test(
    +    """SPARK-20594: This is a walk-around fix to resolve a Hive bug. Hive requires that the
    +      |staging directory needs to avoid being deleted when users set hive.exec.stagingdir
    +      |under the table directory.""".stripMargin) {
    +
    +    withTable("test_table", "test_table1") {
    +      spark.range(1).write.saveAsTable("test_table")
    +
    +      // Make sure the table has also been updated.
    +      checkAnswer(
    +        sql("SELECT * FROM test_table"),
    +        Row(0)
    +      )
    +
    +      sql("CREATE TABLE test_table1 (key int)")
    +
    +      // Set hive.exec.stagingdir under the table directory without start with ".".
    +      sql("set hive.exec.stagingdir=./test")
    --- End diff --
    
    Hmm, you means i should add `sql("reset")` after this test case,right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    yes i tried the same thing in Hive, got the same error:
    `2017-05-08T13:48:04,634 ERROR exec.Task (:()) - Failed with exception Unable to move source hdfs://nameservice/hive/test_table1/test_hive_2017-05-08_13-47-40_660_5235248825413690559-1/-ext-10000 to destination hdfs://nameservice/hive/test_table1
    org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://nameservice/hive/test_table1/test_hive_2017-05-08_13-47-40_660_5235248825413690559-1/-ext-10000 to destination hdfs://nameservice/hive/test_table1
    	at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2959)
    	at org.apache.hadoop.hive.ql.metadata.Hive.replaceFiles(Hive.java:3198)
    	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1805)
    	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:355)
    	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
    	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
    	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1917)
    	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1586)
    	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1331)
    	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1092)
    	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1080)
    	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
    	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
    	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
    	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
    	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
    	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:606)
    	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: java.io.FileNotFoundException: File hdfs://nameservice/hive/test_table1/test_hive_2017-05-08_13-47-40_660_5235248825413690559-1/-ext-10000 does not exist.
    	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:697)
    	at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
    	at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
    	at org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
    	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    	at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:751)
    	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
    	at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
    	at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2896)
    	... 22 more
    
    2017-05-08T13:48:04,635 ERROR ql.Driver (:()) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source hdfs://nameservice/hive/test_table1/test_hive_2017-05-08_13-47-40_660_5235248825413690559-1/-ext-10000 to destination hdfs://nameservice/hive/test_table1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    This will not pass the test cases, because we only deleted the child directory `.hive-staging` of the `stagingDir`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76655/testReport)** for PR 17858 at commit [`2a542e4`](https://github.com/apache/spark/commit/2a542e4248b52e2e5a7a9bc651598b755a81ca3a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r116137309
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,34 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  test(
    +    """SPARK-20594: This is a walk-around fix to resolve a Hive bug. Hive requires that the
    +      |staging directory needs to avoid being deleted when users set hive.exec.stagingdir
    +      |under the table directory.""".stripMargin) {
    --- End diff --
    
    good choice. thanks for your time! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    @gatorsmile My production environment is spark 2.0.2 and test successful. Is there something be changed since 2.0.2 for this case? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76839/testReport)** for PR 17858 at commit [`639d63a`](https://github.com/apache/spark/commit/639d63a20e94523f3443bc83b272fc60c1f5627a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76655/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    if this is a hive bug, this patch seems a valid workaround for Spark SQL.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76566/testReport)** for PR 17858 at commit [`de938ed`](https://github.com/apache/spark/commit/de938ed91f6e62b02fed32c7123f3aae5d51d9f7).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115880117
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,34 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  test(
    +    """SPARK-20594: This is a walk-around fix to resolve a Hive bug. Hive requires that the
    +      |staging directory needs to avoid being deleted when users set hive.exec.stagingdir
    +      |under the table directory.""".stripMargin) {
    +
    +    withTable("test_table", "test_table1") {
    +      spark.range(1).write.saveAsTable("test_table")
    +
    +      // Make sure the table has also been updated.
    +      checkAnswer(
    +        sql("SELECT * FROM test_table"),
    +        Row(0)
    +      )
    +
    +      sql("CREATE TABLE test_table1 (key int)")
    +
    +      // Set hive.exec.stagingdir under the table directory without start with ".".
    +      sql("set hive.exec.stagingdir=./test")
    +
    +      // Now overwrite.
    +      sql("INSERT OVERWRITE TABLE test_table1 SELECT * FROM test_table")
    --- End diff --
    
    We do not need `test_table`, right? 
    
    `INSERT OVERWRITE TABLE test_table1 SELECT 1`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115415586
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,23 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: The staging directory should be a child directory starts with "." to avoid
    +    // being deleted if we set hive.exec.stagingdir under the table directory.
    +    if (FileUtils.isSubDir(new Path(stagingPathName), inputPath, fs)
    +      && !stagingPathName.stripPrefix(inputPathName).startsWith(".")) {
    --- End diff --
    
    Sorry i do not follow your logic. Correct me if I'm wrong, but isn't the logic of dropping the created staging directory was already there before with `fs.deleteOnExit(dir)`?
    As @cloud-fan said this patch seems a valid workaround in Spark SQL for this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Thanks! Merging to master/2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115586962
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,23 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: The staging directory should be a child directory starts with "." to avoid
    +    // being deleted if we set hive.exec.stagingdir under the table directory.
    +    if (FileUtils.isSubDir(new Path(stagingPathName), inputPath, fs)
    +      && !stagingPathName.stripPrefix(inputPathName).startsWith(".")) {
    --- End diff --
    
    `fs.deleteOnExit(dir)` deletes `dir`, but the parent directory is still there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115974006
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,45 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  private def dropTables(tableNames: String*): Unit = {
    +    tableNames.foreach { name =>
    +      sql(s"DROP TABLE IF EXISTS $name")
    +    }
    +  }
    --- End diff --
    
    ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115672012
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,45 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  private def dropTables(tableNames: String*): Unit = {
    +    tableNames.foreach { name =>
    +      sql(s"DROP TABLE IF EXISTS $name")
    +    }
    +  }
    +
    +  test(
    +    """SPARK-20594: The staging directory should be appended with ".hive-staging"
    +      |to avoid being deleted if we set hive.exec.stagingdir under the table directory
    +      |without start with "."""".stripMargin) {
    +
    +    dropTables("test_table", "test_table1")
    +
    +    sql("CREATE TABLE test_table (key int, value string)")
    +
    +    // Add some data.
    +    testData.write.mode(SaveMode.Append).insertInto("test_table")
    --- End diff --
    
    You can simplify the above two lines by ` spark.range(1).write.saveAsTable("test_table")`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115404865
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,23 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: The staging directory should be a child directory starts with "." to avoid
    +    // being deleted if we set hive.exec.stagingdir under the table directory.
    +    if (FileUtils.isSubDir(new Path(stagingPathName), inputPath, fs)
    +      && !stagingPathName.stripPrefix(inputPathName).startsWith(".")) {
    --- End diff --
    
    This is just to hide the issue and make the test cases passed, right?
    
    We need to drop the created staging directory no matter what is the value users set. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76744/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76617 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76617/testReport)** for PR 17858 at commit [`6b22d3e`](https://github.com/apache/spark/commit/6b22d3ea694c4133965ddface73c52c3566cd156).
     * This patch **fails SparkR unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76617 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76617/testReport)** for PR 17858 at commit [`6b22d3e`](https://github.com/apache/spark/commit/6b22d3ea694c4133965ddface73c52c3566cd156).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76765 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76765/testReport)** for PR 17858 at commit [`bf1b4ec`](https://github.com/apache/spark/commit/bf1b4eca9b9692d2dd11a5c166446cc2d5258e86).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115974195
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,45 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  private def dropTables(tableNames: String*): Unit = {
    +    tableNames.foreach { name =>
    +      sql(s"DROP TABLE IF EXISTS $name")
    +    }
    +  }
    +
    +  test(
    +    """SPARK-20594: The staging directory should be appended with ".hive-staging"
    +      |to avoid being deleted if we set hive.exec.stagingdir under the table directory
    +      |without start with "."""".stripMargin) {
    +
    +    dropTables("test_table", "test_table1")
    +
    +    sql("CREATE TABLE test_table (key int, value string)")
    +
    +    // Add some data.
    +    testData.write.mode(SaveMode.Append).insertInto("test_table")
    --- End diff --
    
    no, as i tested we must create table rather than simplify the above two lines by `spark.range(1).write.saveAsTable("test_table")`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76806 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76806/testReport)** for PR 17858 at commit [`4e1b6a0`](https://github.com/apache/spark/commit/4e1b6a037070dcbcb956f600abf152ad2d6dcca2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115674151
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,45 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  private def dropTables(tableNames: String*): Unit = {
    +    tableNames.foreach { name =>
    +      sql(s"DROP TABLE IF EXISTS $name")
    +    }
    +  }
    +
    +  test(
    +    """SPARK-20594: The staging directory should be appended with ".hive-staging"
    +      |to avoid being deleted if we set hive.exec.stagingdir under the table directory
    +      |without start with "."""".stripMargin) {
    +
    +    dropTables("test_table", "test_table1")
    --- End diff --
    
    yes you are right. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76744 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76744/testReport)** for PR 17858 at commit [`9f41436`](https://github.com/apache/spark/commit/9f414366c67e4fd6d46b4ec4d241de73e0921f86).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Thank you all. Delete the branch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76839/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    This sounds a bug in Hive metastore. Could you try the same thing in Hive? Do you hit the same error? Let us see how Hive behaves and then we can decide what is the best way to handle it. Thanks!
    
    BTW, you need to create a test case. For example, `InsertIntoHiveTableSuite.scala`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115879727
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,24 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: This is a walk-around fix to resolve a Hive bug. Hive requires that the
    +    // staging directory needs to avoid being deleted when users set hive.exec.stagingdir
    +    // under the table directory.
    +    if (FileUtils.isSubDir(new Path(stagingPathName), inputPath, fs)
    +      && !stagingPathName.stripPrefix(inputPathName).stripPrefix(File.separator).startsWith(".")) {
    --- End diff --
    
    Nit: Please move `&&` in line 110


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76806/testReport)** for PR 17858 at commit [`4e1b6a0`](https://github.com/apache/spark/commit/4e1b6a037070dcbcb956f600abf152ad2d6dcca2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by zuotingbing <gi...@git.apache.org>.
Github user zuotingbing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115638060
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,23 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: The staging directory should be a child directory starts with "." to avoid
    +    // being deleted if we set hive.exec.stagingdir under the table directory.
    +    if (FileUtils.isSubDir(new Path(stagingPathName), inputPath, fs)
    +      && !stagingPathName.stripPrefix(inputPathName).stripPrefix(File.separator).startsWith(".")) {
    --- End diff --
    
    `dir` is the staging dir which exactly should be deleted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76644/testReport)** for PR 17858 at commit [`6b1b153`](https://github.com/apache/spark/commit/6b1b153e1ee9ec3e7830158d8f8eb274970929ae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be append...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76566/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r115670964
  
    --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala ---
    @@ -97,12 +97,23 @@ case class InsertIntoHiveTable(
         val inputPathUri: URI = inputPath.toUri
         val inputPathName: String = inputPathUri.getPath
         val fs: FileSystem = inputPath.getFileSystem(hadoopConf)
    -    val stagingPathName: String =
    +    var stagingPathName: String =
           if (inputPathName.indexOf(stagingDir) == -1) {
             new Path(inputPathName, stagingDir).toString
           } else {
             inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length)
           }
    +
    +    // SPARK-20594: The staging directory should be a child directory starts with "." to avoid
    +    // being deleted if we set hive.exec.stagingdir under the table directory.
    --- End diff --
    
    How about?
    
    > SPARK-20594: This is a walk-around fix to resolve a Hive bug. Hive requires that the staging directory needs to avoid being deleted when users set `hive.exec.stagingdir` under the table directory. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17858: [SPARK-20594][SQL]The staging directory should be...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17858#discussion_r116039639
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertIntoHiveTableSuite.scala ---
    @@ -494,4 +494,34 @@ class InsertIntoHiveTableSuite extends QueryTest with TestHiveSingleton with Bef
             spark.table("t").write.insertInto(tableName)
           }
       }
    +
    +  test(
    +    """SPARK-20594: This is a walk-around fix to resolve a Hive bug. Hive requires that the
    +      |staging directory needs to avoid being deleted when users set hive.exec.stagingdir
    +      |under the table directory.""".stripMargin) {
    --- End diff --
    
    uh, because that is not to create a Hive table. How about simplifying the test case to?
    ```Scala
      test("SPARK-20594: hive.exec.stagingdir was deleted by Hive") {
        // Set hive.exec.stagingdir under the table directory without start with ".".
        withSQLConf("hive.exec.stagingdir" -> "./test") {
          withTable("test_table") {
            sql("CREATE TABLE test_table (key int)")
            sql("INSERT OVERWRITE TABLE test_table SELECT 1")
            checkAnswer(sql("SELECT * FROM test_table"), Row(1))
          }
        }
      }
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    **[Test build #76655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76655/testReport)** for PR 17858 at commit [`2a542e4`](https://github.com/apache/spark/commit/2a542e4248b52e2e5a7a9bc651598b755a81ca3a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76644/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17858: [SPARK-20594][SQL]The staging directory should be a chil...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17858
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org