You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/02/25 03:02:01 UTC

[GitHub] [spark] moomindani opened a new pull request #27690: [WIP][SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

moomindani opened a new pull request #27690: [WIP][SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690
 
 
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   I added config parameters; (the same parameter names used in Hive)
   
       spark.hadoop.hive.blobstore.supported.schemes (= s3a, s3, s3n)
       spark.hadoop.hive.blobstore.use.blobstore.as.scratchdir (= true)
   
   If the target table's schema is included in hive.blobstore.supported.schemes AND hive.blobstore.use.blobstore.as.scratchdir = false, then HDFS (MRTempDir) is used.
   
   Major benefits of this patch are;
   
   - Increase performance to write data to blob storage
   - Reduce costs for blob storage I/O
   - Reduce the effect of eventual consistency
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   Currently Spark writes temporary data into blob storage such as S3, and it affects performance and costs for customers. This pull request is raised to enhance a feature to change temporary data location from blob storage to HDFS.
   
   ### Does this PR introduce any user-facing change?
   <!--
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If no, write 'No'.
   -->
   Yes
   
   - When `spark.hadoop.hive.blobstore.use.blobstore.as.scratchdir=true`, there is no change.
   - When `spark.hadoop.hive.blobstore.use.blobstore.as.scratchdir=false`, HDFS is used for temporal storage instead of blob storage.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   #### Unit test
   I added two new unit tests; 
   - SaveAsHiveFileSuite: Test the temp path for both blob storage and non-blob storage
   ```
   $ build/sbt
   > project hive
   > testOnly *SaveAsHiveFileSuite
   ...
   [info] ScalaTest
   [info] Run completed in 14 seconds, 779 milliseconds.
   [info] Total number of tests run: 2
   [info] Suites: completed 1, aborted 0
   [info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0
   [info] All tests passed.
   [info] Passed: Total 2, Failed 0, Errors 0, Passed 2
   [success] Total time: 310 s, completed Feb 19, 2020 2:09:58 AM
   ```
   
   - BlobStorageUtilsSuite: Test utility functions for managing configurations
   
   ```
   > testOnly *BlobStorageUtilsSuite
   [info] ScalaTest
   [info] Run completed in 12 seconds, 922 milliseconds.
   [info] Total number of tests run: 6
   [info] Suites: completed 1, aborted 0
   [info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
   [info] All tests passed.
   [info] Passed: Total 6, Failed 0, Errors 0, Passed 6
   [success] Total time: 18 s, completed Feb 19, 2020 2:11:46 AM
   ```
   
   #### Integration test
   In this integration test, I used the same data, the same EMR cluster with different configurations in `spark.hadoop.hive.blobstore.use.blobstore.as.scratchdir`.
   
   ##### Data Tables
   ```
   CREATE EXTERNAL TABLE plain_cf (
     Date DATE,
     Time STRING,
     Location STRING,
     Bytes INT,
     RequestIP STRING,
     Method STRING,
     Host STRING,
     Uri STRING,
     Status INT,
     Referrer STRING,
     os STRING,
     Browser STRING,
     BrowserVersion STRING
         ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
         WITH SERDEPROPERTIES (
         "input.regex" = "^(?!#)([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+([^ ]+)\\s+[^\(]+[\(]([^\;]+).*\%20([^\/]+)[\/](.*)$"
         ) LOCATION 's3a://athena-examples/cloudfront/plaintext/';
   
   CREATE TABLE spark_cf_txt1 (
     Date DATE,
     Time STRING,
     Location STRING)
   STORED AS TEXTFILE
   LOCATION
     's3a://sekiyama-bucket/spark/dev/SPARK21514/spark_cf_txt1';
   
   CREATE TABLE spark_cf_txt2 (
     Date DATE,
     Time STRING,
     Location STRING)
   STORED AS TEXTFILE
   LOCATION
     's3a://sekiyama-bucket/spark/dev/SPARK21514/spark_cf_txt2';
   ```
   
   ##### Scenario 1 (spark.hadoop.hive.blobstore.use.blobstore.as.scratchdir=true)
   
   ```
   spark-sql> INSERT OVERWRITE TABLE spark_cf_txt1 SELECT Date,Time,Location FROM plain_cf;
   ```
   
   Here's the duration per trial (10 trials in total)
   - Time taken: 21.197 seconds
   - Time taken: 27.524 seconds
   - Time taken: 21.385 seconds
   - Time taken: 19.313 seconds
   - Time taken: 20.545 seconds
   - Time taken: 22.509 seconds
   - Time taken: 21.377 seconds
   - Time taken: 20.134 seconds
   - Time taken: 20.632 seconds
   - Time taken: 20.242 seconds
   
   **Average duration: 21.4858 seconds**
   
   ##### Scenario 2 (spark.hadoop.hive.blobstore.use.blobstore.as.scratchdir=false)
   
   ```
   spark-sql> INSERT OVERWRITE TABLE spark_cf_txt2 SELECT Date,Time,Location FROM plain_cf;
   ```
   
   Here's the duration per trial (10 trials in total)
   
   - Time taken: 19.624 seconds
   - Time taken: 15.178 seconds
   - Time taken: 15.645 seconds
   - Time taken: 20.185 seconds
   - Time taken: 15.821 seconds
   - Time taken: 15.545 seconds
   - Time taken: 15.612 seconds
   - Time taken: 14.925 seconds
   - Time taken: 15.691 seconds
   - Time taken: 15.747 seconds
   
   **Average duration: 16.3973 seconds**

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592957295
 
 
   **[Test build #119124 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119124/testReport)** for PR 27690 at commit [`793bacb`](https://github.com/apache/spark/commit/793bacbe7a16d692097656af9ad18f8d1c7ea1f9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class SaveAsHiveFileSuite extends QueryTest with TestHiveSingleton with MockitoSugar `

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592048126
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593253904
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592189546
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23798/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592235944
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119052/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r384853033
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +97,30 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
 
 Review comment:
   Yes, it is coming from this. https://github.com/apache/hive/blob/rel/release-2.3.0/ql/src/java/org/apache/hadoop/hive/ql/Context.java#L588

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593205447
 
 
   **[Test build #119149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119149/testReport)** for PR 27690 at commit [`3ec4418`](https://github.com/apache/spark/commit/3ec4418e287c800242e890fa2259577246ab1a29).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592954622
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119123/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396799099
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +99,43 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
+   *
+   */
+  def getMRTmpPath(
+      hadoopConf: Configuration,
+      sessionScratchDir: String,
+      scratchDir: String): Path = {
+
+    // Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-10000',
+    // which is ruled by 'hive.exec.scratchdir' including file system.
+    // This is the same as Spark's #oldVersionExternalTempPath
+    // Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090
+    // HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+    // Here it uses session_path unless it's emtpy, otherwise uses scratchDir
+    val sessionPath = Option(sessionScratchDir).filterNot(_.isEmpty).getOrElse(scratchDir)
+    logDebug(s"session path '${sessionPath.toString}' is used")
+
+    val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath)
+    logDebug(s"MR scratch dir '$mrScratchDir/-mr-10000' is used")
+    new Path(mrScratchDir, "-mr-10000")
+  }
+
+  private def isBlobStoragePath(path: Path): Boolean = {
+    path != null && isBlobStorageScheme(Option(path.toUri.getScheme).getOrElse(""))
 
 Review comment:
   As far as I tried, it worked even when `fs.default.name` is set to `s3`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385119828
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +97,30 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
 
 Review comment:
   Why v2.3 instead of v3.x? Any reason?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385160313
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +97,30 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
 
 Review comment:
   I believe that currently Spark's default Hive version is 2.3, so I just used the same version of Hive here for simplicity.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600345505
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592302800
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23811/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592189007
 
 
   **[Test build #119052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119052/testReport)** for PR 27690 at commit [`b417289`](https://github.com/apache/spark/commit/b417289729ca187658be903128b6c2c020dafdb0).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-591907302
 
 
   cc @dongjoon-hyun @wangyum

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385098326
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/BlobStorageUtils.scala
 ##########
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Locale
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object BlobStorageUtils {
 
 Review comment:
   We need this utility class? ISTM we can move the three methods below into `SaveAsHiveFile` as private one.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-590660177
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592957542
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119124/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [WIP][SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [WIP][SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-590659867
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592954431
 
 
   **[Test build #119123 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119123/testReport)** for PR 27690 at commit [`3b75929`](https://github.com/apache/spark/commit/3b75929dc25e24120c43ef260f10434eb45769f5).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593205708
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592189538
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-594380411
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119273/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616003384
 
 
   **[Test build #121464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121464/testReport)** for PR 27690 at commit [`073b2e5`](https://github.com/apache/spark/commit/073b2e5aa299225dae830542247be0f15488eba9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593270969
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386185658
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFileSuite.scala
 ##########
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.scalatestplus.mockito.MockitoSugar
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+
+class SaveAsHiveFileSuite extends QueryTest with TestHiveSingleton with MockitoSugar {
+  test("sessionScratchDir = '/tmp/hive/user_a/session_b' & scratchDir = '/tmp/hive_scratch'") {
+    val insertIntoHiveTable = new InsertIntoHiveTable(
+      mock[CatalogTable], Map.empty[String, Option[String]],
+      mock[LogicalPlan], true, false, Seq.empty[String])
 
 Review comment:
   It makes sense to me. Thanks, I did not notice it.
   I removed mockito and just use null per the link.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600227118
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593219493
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23894/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-602925727
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592940250
 
 
   **[Test build #119123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119123/testReport)** for PR 27690 at commit [`3b75929`](https://github.com/apache/spark/commit/3b75929dc25e24120c43ef260f10434eb45769f5).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592942933
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23866/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600231119
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592048126
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603584398
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396179637
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +163,22 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
 
 Review comment:
   It is tested in this unit test.
   https://github.com/apache/spark/pull/27690/files#diff-ee422d26750ba346c81b7f85b4b14577R46

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-602925290
 
 
   **[Test build #120223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120223/testReport)** for PR 27690 at commit [`073b2e5`](https://github.com/apache/spark/commit/073b2e5aa299225dae830542247be0f15488eba9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592957538
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593219493
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23894/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592942933
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23866/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592940348
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23865/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r384642578
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +97,30 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
 
 Review comment:
   Is this method derived from around [here](https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/Context.java#L446)? Or, from another place?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603585552
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25006/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592302793
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592954621
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592048141
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119027/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-591946813
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616003522
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592942860
 
 
   **[Test build #119124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119124/testReport)** for PR 27690 at commit [`793bacb`](https://github.com/apache/spark/commit/793bacbe7a16d692097656af9ad18f8d1c7ea1f9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-602925731
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24936/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396285657
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +163,22 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
 
 Review comment:
   As you mentioned in another comment, empty string is retrieved when `_hive.hdfs.session.path` is not set.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385105052
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -124,11 +147,26 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val hiveVersion = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client.version
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
+    logDebug(s"path '${path.toString}' is used")
+    logDebug(s"staging dir '$stagingDir' is used")
+    logDebug(s"scratch dir '$scratchDir' is used")
 
 Review comment:
   Can you pack the three `logDebug`s into one?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603002292
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603653204
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600345509
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119946/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600231129
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24669/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616003524
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26148/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-594305794
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24012/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592943204
 
 
   > Probably, we need to write how-to-use this functionality somewhere in documents.
   
   Sounds good idea.
   Currently I am thinking to add these two parameters to the section of "Execution Behavior".
   Is it good place for us?
   https://spark.apache.org/docs/latest/configuration.html#execution-behavior

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593270969
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-591944135
 
 
   ok to test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616046899
 
 
   **[Test build #121464 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121464/testReport)** for PR 27690 at commit [`073b2e5`](https://github.com/apache/spark/commit/073b2e5aa299225dae830542247be0f15488eba9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r395532529
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +99,43 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
+   *
+   */
+  def getMRTmpPath(
+      hadoopConf: Configuration,
+      sessionScratchDir: String,
+      scratchDir: String): Path = {
+
+    // Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-10000',
+    // which is ruled by 'hive.exec.scratchdir' including file system.
+    // This is the same as Spark's #oldVersionExternalTempPath
+    // Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090
+    // HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+    // Here it uses session_path unless it's emtpy, otherwise uses scratchDir
+    val sessionPath = Option(sessionScratchDir).filterNot(_.isEmpty).getOrElse(scratchDir)
+    logDebug(s"session path '${sessionPath.toString}' is used")
+
+    val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath)
+    logDebug(s"MR scratch dir '$mrScratchDir/-mr-10000' is used")
+    new Path(mrScratchDir, "-mr-10000")
+  }
+
+  private def isBlobStoragePath(path: Path): Boolean = {
+    path != null && isBlobStorageScheme(Option(path.toUri.getScheme).getOrElse(""))
 
 Review comment:
   I wonder how Hive handles default scheme. Does it work when `fs.default.name` is set to `s3` too?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592302793
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385496854
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -124,11 +147,26 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val hiveVersion = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client.version
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
+    logDebug(s"path '${path.toString}' is used")
+    logDebug(s"staging dir '$stagingDir' is used")
+    logDebug(s"scratch dir '$scratchDir' is used")
 
 Review comment:
   Yes, I packed these 4 (3+1) logDebug into one.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603002304
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120223/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386081505
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +162,23 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
 
 Review comment:
   Could you add a new conf under the name spark `spark.sql.hive.` instead of the same name with hive? Also, please follow the naming rule: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala#L20-L47
   
   By doing so, [sql/create-docs.sh](https://github.com/apache/spark/blob/master/sql/create-docs.sh) could automatically generate a doc for the conf.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-591946813
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593205708
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396830905
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +99,43 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
+   *
+   */
+  def getMRTmpPath(
+      hadoopConf: Configuration,
+      sessionScratchDir: String,
+      scratchDir: String): Path = {
+
+    // Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-10000',
+    // which is ruled by 'hive.exec.scratchdir' including file system.
+    // This is the same as Spark's #oldVersionExternalTempPath
+    // Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090
+    // HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+    // Here it uses session_path unless it's emtpy, otherwise uses scratchDir
+    val sessionPath = Option(sessionScratchDir).filterNot(_.isEmpty).getOrElse(scratchDir)
 
 Review comment:
   Modified.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396289647
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +99,43 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
+   *
+   */
+  def getMRTmpPath(
+      hadoopConf: Configuration,
+      sessionScratchDir: String,
+      scratchDir: String): Path = {
+
+    // Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-10000',
+    // which is ruled by 'hive.exec.scratchdir' including file system.
+    // This is the same as Spark's #oldVersionExternalTempPath
+    // Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090
+    // HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+    // Here it uses session_path unless it's emtpy, otherwise uses scratchDir
+    val sessionPath = Option(sessionScratchDir).filterNot(_.isEmpty).getOrElse(scratchDir)
 
 Review comment:
   Thank you for the comment. As you said, `Option` is not needed here.
   I will re-write this with `val sessionPath = if (!sessionScratchDir.isEmpty) sessionScratchDir else scratchDir`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603585544
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600230513
 
 
   **[Test build #119946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119946/testReport)** for PR 27690 at commit [`76a189e`](https://github.com/apache/spark/commit/76a189e7070c0e279e599f88f97fe76f218e055b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396830941
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -751,6 +751,20 @@ object SQLConf {
     .checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
     .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
 
+  val HIVE_BLOBSTORE_SUPPORTED_SCHEMES =
+    buildConf("spark.sql.hive.blobstore.supported.schemes")
+      .doc("Comma-separated list of supported blobstore schemes.")
 
 Review comment:
   Modified.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-591946463
 
 
   **[Test build #119027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119027/testReport)** for PR 27690 at commit [`b417289`](https://github.com/apache/spark/commit/b417289729ca187658be903128b6c2c020dafdb0).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616047416
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386024959
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/BlobStorageUtils.scala
 ##########
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Locale
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object BlobStorageUtils {
 
 Review comment:
   I moved the functions to private functions in `SaveAsHiveFile`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592940342
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386081888
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +162,23 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
+    logDebug(s"path '${path.toString}', staging dir '$stagingDir', " +
+      s"scratch dir '$scratchDir', session scratch dir '$sessionScratchDir' are used")
+
     if (hiveVersionsUsingOldExternalTempPath.contains(hiveVersion)) {
       oldVersionExternalTempPath(path, hadoopConf, scratchDir)
     } else if (hiveVersionsUsingNewExternalTempPath.contains(hiveVersion)) {
-      newVersionExternalTempPath(path, hadoopConf, stagingDir)
+      // HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3
+      // Copied from Context.java#getTempDirForPath of Hive 2.3
+      if (isBlobStoragePath(hadoopConf, path)
+        && !useBlobStorageAsScratchDir(hadoopConf)) {
+        getMRTmpPath(hadoopConf, sessionScratchDir, scratchDir)
 
 Review comment:
   btw, we need to use the same config name with hive?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386026502
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFileSuite.scala
 ##########
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.scalatest.GivenWhenThen
+import org.scalatestplus.mockito.MockitoSugar
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+
+class SaveAsHiveFileSuite extends QueryTest with TestHiveSingleton
+    with MockitoSugar with GivenWhenThen {
+  test("getMRTmpPath method") {
+    val insertIntoHiveTable = new InsertIntoHiveTable(
+      mock[CatalogTable], Map.empty[String, Option[String]],
+      mock[LogicalPlan], true, false, Seq.empty[String])
+    val hadoopConf = new Configuration()
+    val scratchDir = "/tmp/hive_scratch"
+    val sessionScratchDir = "/tmp/hive/user_a/session_b"
+
+    Given(s"sessionScratchDir = '$sessionScratchDir' & scratchDir = '$scratchDir'")
+    When("get the path from getMRTmpPath(hadoopConf, sessionScratchDir, scratchDir)")
 
 Review comment:
   I see, I changed to use simple `assert` instead of `GivenWhenThen`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385160626
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +97,30 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
 
 Review comment:
   I believe that currently Spark's default Hive version is 2.3, so I just used the same version of Hive here for simplicity.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592302469
 
 
   **[Test build #119066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119066/testReport)** for PR 27690 at commit [`17693b3`](https://github.com/apache/spark/commit/17693b364e5cb00c07c04f6c5e7f430329584e6f).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-594305794
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24012/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600231129
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24669/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386023688
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +97,30 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
 
 Review comment:
   I checked `hive/Context.java` in the master branch, and I did not see much difference.
   - master: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/Context.java
   - release 2.3.0: https://github.com/apache/hive/blob/rel/release-2.3.0/ql/src/java/org/apache/hadoop/hive/ql/Context.java#L588

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593253909
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119149/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600230513
 
 
   **[Test build #119946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119946/testReport)** for PR 27690 at commit [`76a189e`](https://github.com/apache/spark/commit/76a189e7070c0e279e599f88f97fe76f218e055b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593219489
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592388861
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119066/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-594305442
 
 
   **[Test build #119273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119273/testReport)** for PR 27690 at commit [`76a189e`](https://github.com/apache/spark/commit/76a189e7070c0e279e599f88f97fe76f218e055b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r410829054
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -751,6 +751,22 @@ object SQLConf {
     .checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
     .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
 
+  val HIVE_BLOBSTORE_SUPPORTED_SCHEMES =
+    buildConf("spark.sql.hive.blobstore.supported.schemes")
+      .doc("Comma-separated list of supported blobstore schemes.")
 
 Review comment:
   Plz add `.version("3.1.0")`, see: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L189

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592942932
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [WIP][SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [WIP][SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-590660177
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
kiszk commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r384633899
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/BlobStorageUtils.scala
 ##########
 @@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Locale
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object BlobStorageUtils {
+  def isBlobStoragePath(
+      hadoopConf: Configuration,
+      path: Path): Boolean = {
+    path != null && isBlobStorageScheme(hadoopConf, Option(path.toUri.getScheme).getOrElse(""))
+  }
+
+  def isBlobStorageScheme(
+      hadoopConf: Configuration,
+      scheme: String): Boolean = {
+    val supportedBlobSchemes = hadoopConf.get("hive.blobstore.supported.schemes", "s3,s3a,s3n")
+    supportedBlobSchemes.toLowerCase(Locale.ROOT)
+      .split(",")
+      .map(_.trim)
+      .toList
 
 Review comment:
   Do we need `toList`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616003524
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/26148/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592048141
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119027/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386301037
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +162,23 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
+    logDebug(s"path '${path.toString}', staging dir '$stagingDir', " +
+      s"scratch dir '$scratchDir', session scratch dir '$sessionScratchDir' are used")
+
     if (hiveVersionsUsingOldExternalTempPath.contains(hiveVersion)) {
       oldVersionExternalTempPath(path, hadoopConf, scratchDir)
     } else if (hiveVersionsUsingNewExternalTempPath.contains(hiveVersion)) {
-      newVersionExternalTempPath(path, hadoopConf, stagingDir)
+      // HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3
+      // Copied from Context.java#getTempDirForPath of Hive 2.3
+      if (isBlobStoragePath(hadoopConf, path)
+        && !useBlobStorageAsScratchDir(hadoopConf)) {
+        getMRTmpPath(hadoopConf, sessionScratchDir, scratchDir)
 
 Review comment:
   cc @dongjoon-hyun @wangyum

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-594379614
 
 
   **[Test build #119273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119273/testReport)** for PR 27690 at commit [`76a189e`](https://github.com/apache/spark/commit/76a189e7070c0e279e599f88f97fe76f218e055b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593253217
 
 
   **[Test build #119149 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119149/testReport)** for PR 27690 at commit [`3ec4418`](https://github.com/apache/spark/commit/3ec4418e287c800242e890fa2259577246ab1a29).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class SaveAsHiveFileSuite extends QueryTest with TestHiveSingleton `

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592954621
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592235935
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-591943929
 
 
   Could you show us performance numbers w/ this change?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385496980
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/BlobStorageUtils.scala
 ##########
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Locale
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object BlobStorageUtils {
+  def isBlobStoragePath(
+      hadoopConf: Configuration,
+      path: Path): Boolean = {
+    path != null && isBlobStorageScheme(hadoopConf, Option(path.toUri.getScheme).getOrElse(""))
+  }
+
+  def isBlobStorageScheme(
+      hadoopConf: Configuration,
+      scheme: String): Boolean = {
+    val supportedBlobSchemes = hadoopConf.get("hive.blobstore.supported.schemes", "s3,s3a,s3n")
+    supportedBlobSchemes.toLowerCase(Locale.ROOT)
+      .split(",")
+      .map(_.trim)
+      .contains(scheme.toLowerCase(Locale.ROOT))
 
 Review comment:
   I replaced this with `Utils.stringToSeq`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-591946823
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23774/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592235944
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119052/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600344859
 
 
   **[Test build #119946 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119946/testReport)** for PR 27690 at commit [`76a189e`](https://github.com/apache/spark/commit/76a189e7070c0e279e599f88f97fe76f218e055b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385119355
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFileSuite.scala
 ##########
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.scalatest.GivenWhenThen
+import org.scalatestplus.mockito.MockitoSugar
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+
+class SaveAsHiveFileSuite extends QueryTest with TestHiveSingleton
+    with MockitoSugar with GivenWhenThen {
+  test("getMRTmpPath method") {
+    val insertIntoHiveTable = new InsertIntoHiveTable(
+      mock[CatalogTable], Map.empty[String, Option[String]],
+      mock[LogicalPlan], true, false, Seq.empty[String])
+    val hadoopConf = new Configuration()
+    val scratchDir = "/tmp/hive_scratch"
+    val sessionScratchDir = "/tmp/hive/user_a/session_b"
+
+    Given(s"sessionScratchDir = '$sessionScratchDir' & scratchDir = '$scratchDir'")
+    When("get the path from getMRTmpPath(hadoopConf, sessionScratchDir, scratchDir)")
 
 Review comment:
   Can you follow the format of the other tests? For example, if you have no strong opinion, I think you'd be better to just use `assert` instead of `GivenWhenThen` to follow the others.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r387429262
 
 

 ##########
 File path: sql/hive/pom.xml
 ##########
 @@ -189,6 +189,11 @@
       <artifactId>scalacheck_${scala.binary.version}</artifactId>
       <scope>test</scope>
     </dependency>
+    <dependency>
+      <groupId>org.mockito</groupId>
+      <artifactId>mockito-core</artifactId>
+      <scope>test</scope>
 
 Review comment:
   Thanks, I removed it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593205712
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23891/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592957542
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119124/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592189546
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23798/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592942932
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592940250
 
 
   **[Test build #119123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119123/testReport)** for PR 27690 at commit [`3b75929`](https://github.com/apache/spark/commit/3b75929dc25e24120c43ef260f10434eb45769f5).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592957538
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-591968407
 
 
   Probably, we need to write how-to-use this functionality somewhere in documents.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396186608
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +163,22 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
 
 Review comment:
   Thanks, @moomindani. How does Hive behaves when `_hive.hdfs.session.path` is not set?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396284464
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -751,6 +751,20 @@ object SQLConf {
     .checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
     .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
 
+  val HIVE_BLOBSTORE_SUPPORTED_SCHEMES =
+    buildConf("spark.sql.hive.blobstore.supported.schemes")
+      .doc("Comma-separated list of supported blobstore schemes.")
 
 Review comment:
   Thank you for the comment. Current explanation is the same sentence as Hive's. I will add explanation like `If you disable this parameter, Spark writes the data first in scratch dir, and move it to blobstore because moving it on blobstore is expensive.` based on your comment.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593219165
 
 
   **[Test build #119152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119152/testReport)** for PR 27690 at commit [`05dfedb`](https://github.com/apache/spark/commit/05dfedb345bc5314453cc8ae701de23f93991833).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593270980
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119152/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-594380411
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119273/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385100221
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/BlobStorageUtils.scala
 ##########
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Locale
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object BlobStorageUtils {
+  def isBlobStoragePath(
+      hadoopConf: Configuration,
+      path: Path): Boolean = {
+    path != null && isBlobStorageScheme(hadoopConf, Option(path.toUri.getScheme).getOrElse(""))
+  }
+
+  def isBlobStorageScheme(
+      hadoopConf: Configuration,
+      scheme: String): Boolean = {
 
 Review comment:
   ditto

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593205447
 
 
   **[Test build #119149 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119149/testReport)** for PR 27690 at commit [`3ec4418`](https://github.com/apache/spark/commit/3ec4418e287c800242e890fa2259577246ab1a29).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593270211
 
 
   **[Test build #119152 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119152/testReport)** for PR 27690 at commit [`05dfedb`](https://github.com/apache/spark/commit/05dfedb345bc5314453cc8ae701de23f93991833).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [WIP][SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [WIP][SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-590659867
 
 
   Can one of the admins verify this patch?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592940348
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23865/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592302800
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23811/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385497038
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/BlobStorageUtils.scala
 ##########
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Locale
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object BlobStorageUtils {
+  def isBlobStoragePath(
+      hadoopConf: Configuration,
+      path: Path): Boolean = {
 
 Review comment:
   Removed unneeded line breaks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592388854
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603002304
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120223/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603001894
 
 
   **[Test build #120223 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120223/testReport)** for PR 27690 at commit [`073b2e5`](https://github.com/apache/spark/commit/073b2e5aa299225dae830542247be0f15488eba9).
    * This patch **fails PySpark pip packaging tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386081170
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFileSuite.scala
 ##########
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.scalatestplus.mockito.MockitoSugar
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+
+class SaveAsHiveFileSuite extends QueryTest with TestHiveSingleton with MockitoSugar {
+  test("sessionScratchDir = '/tmp/hive/user_a/session_b' & scratchDir = '/tmp/hive_scratch'") {
+    val insertIntoHiveTable = new InsertIntoHiveTable(
+      mock[CatalogTable], Map.empty[String, Option[String]],
+      mock[LogicalPlan], true, false, Seq.empty[String])
 
 Review comment:
   I think you don't need to use mockito here and could you follow the other?, e.g., https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala#L558

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603653204
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603653210
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120296/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396185493
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +99,43 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
+   *
+   */
+  def getMRTmpPath(
+      hadoopConf: Configuration,
+      sessionScratchDir: String,
+      scratchDir: String): Path = {
+
+    // Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-10000',
+    // which is ruled by 'hive.exec.scratchdir' including file system.
+    // This is the same as Spark's #oldVersionExternalTempPath
+    // Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090
+    // HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+    // Here it uses session_path unless it's emtpy, otherwise uses scratchDir
+    val sessionPath = Option(sessionScratchDir).filterNot(_.isEmpty).getOrElse(scratchDir)
 
 Review comment:
   Why do we need to do `Option` here if it becomes an empty string If `_hive.hdfs.session.path` is empty?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-594305785
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396799099
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +99,43 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
+   *
+   */
+  def getMRTmpPath(
+      hadoopConf: Configuration,
+      sessionScratchDir: String,
+      scratchDir: String): Path = {
+
+    // Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-10000',
+    // which is ruled by 'hive.exec.scratchdir' including file system.
+    // This is the same as Spark's #oldVersionExternalTempPath
+    // Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090
+    // HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+    // Here it uses session_path unless it's emtpy, otherwise uses scratchDir
+    val sessionPath = Option(sessionScratchDir).filterNot(_.isEmpty).getOrElse(scratchDir)
+    logDebug(s"session path '${sessionPath.toString}' is used")
+
+    val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath)
+    logDebug(s"MR scratch dir '$mrScratchDir/-mr-10000' is used")
+    new Path(mrScratchDir, "-mr-10000")
+  }
+
+  private def isBlobStoragePath(path: Path): Boolean = {
+    path != null && isBlobStorageScheme(Option(path.toUri.getScheme).getOrElse(""))
 
 Review comment:
   As far as I tried, it worked even when `fs.default.name` is set to `s3.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592954622
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119123/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385105052
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -124,11 +147,26 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val hiveVersion = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog].client.version
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
+    logDebug(s"path '${path.toString}' is used")
+    logDebug(s"staging dir '$stagingDir' is used")
+    logDebug(s"scratch dir '$scratchDir' is used")
 
 Review comment:
   nit: Can you pack the three `logDebug`s into one?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592235935
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600345509
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119946/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386081631
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +162,23 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
+    logDebug(s"path '${path.toString}', staging dir '$stagingDir', " +
+      s"scratch dir '$scratchDir', session scratch dir '$sessionScratchDir' are used")
+
     if (hiveVersionsUsingOldExternalTempPath.contains(hiveVersion)) {
       oldVersionExternalTempPath(path, hadoopConf, scratchDir)
     } else if (hiveVersionsUsingNewExternalTempPath.contains(hiveVersion)) {
-      newVersionExternalTempPath(path, hadoopConf, stagingDir)
+      // HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3
+      // Copied from Context.java#getTempDirForPath of Hive 2.3
+      if (isBlobStoragePath(hadoopConf, path)
+        && !useBlobStorageAsScratchDir(hadoopConf)) {
+        getMRTmpPath(hadoopConf, sessionScratchDir, scratchDir)
 
 Review comment:
   Could you add a new conf under the name spark `spark.sql.hive.` for this new feature? Also, please follow the naming rule: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala#L20-L47
   
   By doing so, [sql/create-docs.sh](https://github.com/apache/spark/blob/master/sql/create-docs.sh) could automatically generate a doc for the conf.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592188716
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593253909
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119149/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-602925290
 
 
   **[Test build #120223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120223/testReport)** for PR 27690 at commit [`073b2e5`](https://github.com/apache/spark/commit/073b2e5aa299225dae830542247be0f15488eba9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385188867
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +97,30 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
 
 Review comment:
   Ah, I see. IMO newer code is better. Is the `getMRTempPath` code different between 2.3 and 3.x?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kiszk commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
kiszk commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616003139
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593219489
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592940342
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616003384
 
 
   **[Test build #121464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121464/testReport)** for PR 27690 at commit [`073b2e5`](https://github.com/apache/spark/commit/073b2e5aa299225dae830542247be0f15488eba9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593270980
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119152/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386026628
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFileSuite.scala
 ##########
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.scalatest.GivenWhenThen
+import org.scalatestplus.mockito.MockitoSugar
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+
+class SaveAsHiveFileSuite extends QueryTest with TestHiveSingleton
+    with MockitoSugar with GivenWhenThen {
 
 Review comment:
   In order to test this patch's exact behavior, we might need to check if the intermediate files are located in HDFS not in blobstore. However, it is not so easy to test it. That's the reason why I am checking each path here instead.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592388854
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396178891
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +163,22 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
 
 Review comment:
   If `_hive.hdfs.session.path` is empty, `getMRTmpPath` uses `scratchDir` instead of `sessionScratchDir` in this line:  `    val sessionPath = Option(sessionScratchDir).filterNot(_.isEmpty).getOrElse(scratchDir)`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592942860
 
 
   **[Test build #119124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119124/testReport)** for PR 27690 at commit [`793bacb`](https://github.com/apache/spark/commit/793bacbe7a16d692097656af9ad18f8d1c7ea1f9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385101575
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/BlobStorageUtils.scala
 ##########
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Locale
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object BlobStorageUtils {
+  def isBlobStoragePath(
+      hadoopConf: Configuration,
+      path: Path): Boolean = {
+    path != null && isBlobStorageScheme(hadoopConf, Option(path.toUri.getScheme).getOrElse(""))
+  }
+
+  def isBlobStorageScheme(
+      hadoopConf: Configuration,
+      scheme: String): Boolean = {
+    val supportedBlobSchemes = hadoopConf.get("hive.blobstore.supported.schemes", "s3,s3a,s3n")
+    supportedBlobSchemes.toLowerCase(Locale.ROOT)
+      .split(",")
+      .map(_.trim)
+      .contains(scheme.toLowerCase(Locale.ROOT))
 
 Review comment:
   Can we use `Utils.stringToSeq` here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r395530390
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -751,6 +751,20 @@ object SQLConf {
     .checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
     .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
 
+  val HIVE_BLOBSTORE_SUPPORTED_SCHEMES =
+    buildConf("spark.sql.hive.blobstore.supported.schemes")
+      .doc("Comma-separated list of supported blobstore schemes.")
 
 Review comment:
   Where does this feature to change the staging directory to, HDFS? Can you fix the documentation to describe what the configuration does? Seems like it writes the data first in staging dir, and move it to S3 because renaming on S3 is expensive.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600231119
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593205712
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23891/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603653210
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120296/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r396829010
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -97,7 +99,43 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
       options = Map.empty)
   }
 
-  protected def getExternalTmpPath(
+  /*
+   * Mostly copied from Context.java#getMRTmpPath of Hive 2.3
+   *
+   */
+  def getMRTmpPath(
+      hadoopConf: Configuration,
+      sessionScratchDir: String,
+      scratchDir: String): Path = {
+
+    // Hive's getMRTmpPath uses nonLocalScratchPath + '-mr-10000',
+    // which is ruled by 'hive.exec.scratchdir' including file system.
+    // This is the same as Spark's #oldVersionExternalTempPath
+    // Only difference between #oldVersionExternalTempPath and Hive 2.3.0's is HIVE-7090
+    // HIVE-7090 added user_name/session_id on top of 'hive.exec.scratchdir'
+    // Here it uses session_path unless it's emtpy, otherwise uses scratchDir
+    val sessionPath = Option(sessionScratchDir).filterNot(_.isEmpty).getOrElse(scratchDir)
+    logDebug(s"session path '${sessionPath.toString}' is used")
+
+    val mrScratchDir = oldVersionExternalTempPath(new Path(sessionPath), hadoopConf, sessionPath)
+    logDebug(s"MR scratch dir '$mrScratchDir/-mr-10000' is used")
+    new Path(mrScratchDir, "-mr-10000")
+  }
+
+  private def isBlobStoragePath(path: Path): Boolean = {
+    path != null && isBlobStorageScheme(Option(path.toUri.getScheme).getOrElse(""))
 
 Review comment:
   I believe that there are only two cases where S3 is used as scratch dir even for S3 path with `spark.sql.hive.blobstore.use.blobstore.as.scratchdir=false`.
   - When `_hive.hdfs.session.path` is set to `s3`
   - When `_hive.hdfs.session.path` is not set and `hive.exec.scratchdir` is set to `s3`
   
   In both cases, these params can be configured by users, so it won't be an issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r395525840
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +163,22 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
 
 Review comment:
   Shall we have an assert on it or throw a proper exception when `_hive.hdfs.session.path` was not set?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603652551
 
 
   **[Test build #120296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120296/testReport)** for PR 27690 at commit [`073b2e5`](https://github.com/apache/spark/commit/073b2e5aa299225dae830542247be0f15488eba9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593219165
 
 
   **[Test build #119152 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119152/testReport)** for PR 27690 at commit [`05dfedb`](https://github.com/apache/spark/commit/05dfedb345bc5314453cc8ae701de23f93991833).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603585544
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-593253904
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385116300
 
 

 ##########
 File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFileSuite.scala
 ##########
 @@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.scalatest.GivenWhenThen
+import org.scalatestplus.mockito.MockitoSugar
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.catalyst.catalog.CatalogTable
+import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
+import org.apache.spark.sql.hive.test.TestHiveSingleton
+
+class SaveAsHiveFileSuite extends QueryTest with TestHiveSingleton
+    with MockitoSugar with GivenWhenThen {
 
 Review comment:
   Can you write tests more simply w/o mockito? For example, how about just checking if output files exist or not like the other tests? e.g., 
   https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala#L2004

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r395530390
 
 

 ##########
 File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##########
 @@ -751,6 +751,20 @@ object SQLConf {
     .checkValues(HiveCaseSensitiveInferenceMode.values.map(_.toString))
     .createWithDefault(HiveCaseSensitiveInferenceMode.NEVER_INFER.toString)
 
+  val HIVE_BLOBSTORE_SUPPORTED_SCHEMES =
+    buildConf("spark.sql.hive.blobstore.supported.schemes")
+      .doc("Comma-separated list of supported blobstore schemes.")
 
 Review comment:
   Where does this feature to change the staging directory to, HDFS? Can you fix the documentation to describe what the configuration does? Seems like it writes the data first in staging dir, and move it to S3 because moving it on S3 is expensive.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-591946823
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/23774/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385100157
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/BlobStorageUtils.scala
 ##########
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Locale
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object BlobStorageUtils {
+  def isBlobStoragePath(
+      hadoopConf: Configuration,
+      path: Path): Boolean = {
 
 Review comment:
   nit: 
   ```
   def isBlobStoragePath(hadoopConf: Configuration, path: Path): Boolean = {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592388097
 
 
   **[Test build #119066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119066/testReport)** for PR 27690 at commit [`17693b3`](https://github.com/apache/spark/commit/17693b364e5cb00c07c04f6c5e7f430329584e6f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592388861
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119066/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385100157
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/BlobStorageUtils.scala
 ##########
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Locale
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object BlobStorageUtils {
+  def isBlobStoragePath(
+      hadoopConf: Configuration,
+      path: Path): Boolean = {
 
 Review comment:
   nit: no line break required here;
   ```
   def isBlobStoragePath(hadoopConf: Configuration, path: Path): Boolean = {
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r384856247
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/BlobStorageUtils.scala
 ##########
 @@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Locale
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object BlobStorageUtils {
+  def isBlobStoragePath(
+      hadoopConf: Configuration,
+      path: Path): Boolean = {
+    path != null && isBlobStorageScheme(hadoopConf, Option(path.toUri.getScheme).getOrElse(""))
+  }
+
+  def isBlobStorageScheme(
+      hadoopConf: Configuration,
+      scheme: String): Boolean = {
+    val supportedBlobSchemes = hadoopConf.get("hive.blobstore.supported.schemes", "s3,s3a,s3n")
+    supportedBlobSchemes.toLowerCase(Locale.ROOT)
+      .split(",")
+      .map(_.trim)
+      .toList
 
 Review comment:
   Thanks for the comment. Right, we do not need `toList`.
   I removed it, and confirmed that unit test succeeds.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-591943929
 
 
   Could you show us performance numbers w/ this change?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-594305442
 
 
   **[Test build #119273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119273/testReport)** for PR 27690 at commit [`76a189e`](https://github.com/apache/spark/commit/76a189e7070c0e279e599f88f97fe76f218e055b).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616047416
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592235698
 
 
   **[Test build #119052 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119052/testReport)** for PR 27690 at commit [`b417289`](https://github.com/apache/spark/commit/b417289729ca187658be903128b6c2c020dafdb0).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-594380406
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-594305785
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603585031
 
 
   **[Test build #120296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120296/testReport)** for PR 27690 at commit [`073b2e5`](https://github.com/apache/spark/commit/073b2e5aa299225dae830542247be0f15488eba9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616047423
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121464/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-602925727
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-600345505
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616003522
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386197390
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +162,23 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
+    logDebug(s"path '${path.toString}', staging dir '$stagingDir', " +
+      s"scratch dir '$scratchDir', session scratch dir '$sessionScratchDir' are used")
+
     if (hiveVersionsUsingOldExternalTempPath.contains(hiveVersion)) {
       oldVersionExternalTempPath(path, hadoopConf, scratchDir)
     } else if (hiveVersionsUsingNewExternalTempPath.contains(hiveVersion)) {
-      newVersionExternalTempPath(path, hadoopConf, stagingDir)
+      // HIVE-14270: Write temporary data to HDFS when doing inserts on tables located on S3
+      // Copied from Context.java#getTempDirForPath of Hive 2.3
+      if (isBlobStoragePath(hadoopConf, path)
+        && !useBlobStorageAsScratchDir(hadoopConf)) {
+        getMRTmpPath(hadoopConf, sessionScratchDir, scratchDir)
 
 Review comment:
   Although `I used spark.hadoop.hive.blobstore.*` in the initial commit, it was for consistency.
   Since I do not have any technical reason here, I changed it to use SQLConf to follow the naming rule here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-594380406
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592189007
 
 
   **[Test build #119052 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119052/testReport)** for PR 27690 at commit [`b417289`](https://github.com/apache/spark/commit/b417289729ca187658be903128b6c2c020dafdb0).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592189538
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603585552
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25006/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386081505
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +162,23 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
 
 Review comment:
   Could you add a new conf under the name spark `spark.sql.hive.` for this new feature? Also, please follow the naming rule: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala#L20-L47
   
   By doing so, [sql/create-docs.sh](https://github.com/apache/spark/blob/master/sql/create-docs.sh) could automatically generate a doc for the conf.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
moomindani commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r385497093
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/BlobStorageUtils.scala
 ##########
 @@ -0,0 +1,45 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.execution
+
+import java.util.Locale
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object BlobStorageUtils {
+  def isBlobStoragePath(
+      hadoopConf: Configuration,
+      path: Path): Boolean = {
+    path != null && isBlobStorageScheme(hadoopConf, Option(path.toUri.getScheme).getOrElse(""))
+  }
+
+  def isBlobStorageScheme(
+      hadoopConf: Configuration,
+      scheme: String): Boolean = {
 
 Review comment:
   Removed unneeded line breaks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592047031
 
 
   **[Test build #119027 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119027/testReport)** for PR 27690 at commit [`b417289`](https://github.com/apache/spark/commit/b417289729ca187658be903128b6c2c020dafdb0).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-592302469
 
 
   **[Test build #119066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119066/testReport)** for PR 27690 at commit [`17693b3`](https://github.com/apache/spark/commit/17693b364e5cb00c07c04f6c5e7f430329584e6f).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-602925731
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24936/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603002292
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386301996
 
 

 ##########
 File path: sql/hive/pom.xml
 ##########
 @@ -189,6 +189,11 @@
       <artifactId>scalacheck_${scala.binary.version}</artifactId>
       <scope>test</scope>
     </dependency>
+    <dependency>
+      <groupId>org.mockito</groupId>
+      <artifactId>mockito-core</artifactId>
+      <scope>test</scope>
 
 Review comment:
   plz remove this dependency.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-591946463
 
 
   **[Test build #119027 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119027/testReport)** for PR 27690 at commit [`b417289`](https://github.com/apache/spark/commit/b417289729ca187658be903128b6c2c020dafdb0).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-616047423
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121464/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#issuecomment-603585031
 
 
   **[Test build #120296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120296/testReport)** for PR 27690 at commit [`073b2e5`](https://github.com/apache/spark/commit/073b2e5aa299225dae830542247be0f15488eba9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27690: [SPARK-21514][SQL] Added a new option to use non-blobstore storage when writing into blobstore storage
URL: https://github.com/apache/spark/pull/27690#discussion_r386081505
 
 

 ##########
 File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/SaveAsHiveFile.scala
 ##########
 @@ -125,10 +162,23 @@ private[hive] trait SaveAsHiveFile extends DataWritingCommand {
     val stagingDir = hadoopConf.get("hive.exec.stagingdir", ".hive-staging")
     val scratchDir = hadoopConf.get("hive.exec.scratchdir", "/tmp/hive")
 
+    // Hive sets session_path as HDFS_SESSION_PATH_KEY(_hive.hdfs.session.path) in hive config
+    val sessionScratchDir = externalCatalog.unwrapped.asInstanceOf[HiveExternalCatalog]
+      .client.getConf("_hive.hdfs.session.path", "")
 
 Review comment:
   Could you add a new conf under the name spark `spark.sql.hive.` for this new feature? Also, please follow the naming rule: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala#L20-L47
   
   By doing so, [sql/create-docs.sh](https://github.com/apache/spark/blob/master/sql/create-docs.sh) could automatically generate a doc for the conf.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org