You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/07 05:40:38 UTC

[GitHub] [spark] sarutak opened a new pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

sarutak opened a new pull request #32074:
URL: https://github.com/apache/spark/pull/32074


   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   This PR fixes an issue that `LIST {FILES/JARS/ARCHIVES} path1, path2, ...` cannot list all paths if at least one path is quoted.
   An example here.
   ```
   ADD FILE /tmp/test1;
   ADD FILE /tmp/test2;
   
   LIST FILES /tmp/test1 /tmp/test2;
   file:/tmp/test1
   file:/tmp/test2
   
   LIST FILES /tmp/test1 "/tmp/test2";
   file:/tmp/test2
   ```
   
   In this example, the second `LIST FILES` doesn't show `file:/tmp/test1`.
   
   To resolve this issue, I modified the syntax rule to be able to handle this case.
   I also changed `SparkSQLParser` to be able to handle paths which contains white spaces.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   This is a bug.
   I also have a plan which extends `ADD FILE/JAR/ARCHIVE` to take multiple paths like Hive and the syntax rule change is necessary for that.
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   Yes. Users can pass quoted paths when using `ADD FILE/JAR/ARCHIVE`.
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   New test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816371261


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41688/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32074:
URL: https://github.com/apache/spark/pull/32074#discussion_r611321847



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala
##########
@@ -67,17 +65,16 @@ case class ListFilesCommand(files: Seq[String] = Seq.empty[String]) extends Runn
   }
   override def run(sparkSession: SparkSession): Seq[Row] = {
     val fileList = sparkSession.sparkContext.listFiles()
-    if (files.size > 0) {
-      files.map { f =>
-        val uri = new URI(f)
-        val schemeCorrectedPath = uri.getScheme match {
-          case null | "local" => new File(f).getCanonicalFile.toURI.toString
-          case _ => f
-        }
-        new Path(schemeCorrectedPath).toUri.toString
-      }.collect {
-        case f if fileList.contains(f) => f
-      }.map(Row(_))

Review comment:
       @sarutak, what's the diff here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816439982


   **[Test build #137116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137116/testReport)** for PR 32074 at commit [`7416724`](https://github.com/apache/spark/commit/7416724f01b3e9f2bc27eaac286e01f079f4a37e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816641516


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137116/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-817860866


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137208/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
sarutak commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-819158866


   Thanks all. Merged into `master`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-817860866


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137208/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814788670


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41589/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on a change in pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
sarutak commented on a change in pull request #32074:
URL: https://github.com/apache/spark/pull/32074#discussion_r611467675



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala
##########
@@ -67,17 +65,16 @@ case class ListFilesCommand(files: Seq[String] = Seq.empty[String]) extends Runn
   }
   override def run(sparkSession: SparkSession): Seq[Row] = {
     val fileList = sparkSession.sparkContext.listFiles()
-    if (files.size > 0) {
-      files.map { f =>
-        val uri = new URI(f)
-        val schemeCorrectedPath = uri.getScheme match {
-          case null | "local" => new File(f).getCanonicalFile.toURI.toString
-          case _ => f
-        }
-        new Path(schemeCorrectedPath).toUri.toString

Review comment:
       Actually, we can get a wrong result with `new Path(schemeCorrectedPath).toUri.toString`.
   The constructor of `Path` which takes a `String` parameter doesn't handle URI formed string properly.
   For example, "file:///test%20file.txt" is already encoded and valid URI but the constructor handles such URI as **NOT** encoded.
   As a result, `new Path("file:///path/to/test%20file.txt").toUri.toString` returns `file:///path/to/test%2020file.txt` though `file:///path/to/test%20file.txt` is expected.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-817649909


   **[Test build #137208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137208/testReport)** for PR 32074 at commit [`3f78c43`](https://github.com/apache/spark/commit/3f78c4387f6acc3c781a39876214376ba86b0e2b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815267183


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137026/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816438572


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137110/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815040801


   **[Test build #137026 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137026/testReport)** for PR 32074 at commit [`752da21`](https://github.com/apache/spark/commit/752da2178e5698281f39e12de8a4be748e13feea).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32074:
URL: https://github.com/apache/spark/pull/32074#discussion_r611322214



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala
##########
@@ -67,17 +65,16 @@ case class ListFilesCommand(files: Seq[String] = Seq.empty[String]) extends Runn
   }
   override def run(sparkSession: SparkSession): Seq[Row] = {
     val fileList = sparkSession.sparkContext.listFiles()
-    if (files.size > 0) {
-      files.map { f =>
-        val uri = new URI(f)
-        val schemeCorrectedPath = uri.getScheme match {
-          case null | "local" => new File(f).getCanonicalFile.toURI.toString
-          case _ => f
-        }
-        new Path(schemeCorrectedPath).toUri.toString
-      }.collect {
-        case f if fileList.contains(f) => f
-      }.map(Row(_))
+      if (files.size > 0) {

Review comment:
       seems like there's a mistake in indentation 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815419786


   **[Test build #137053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137053/testReport)** for PR 32074 at commit [`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32074:
URL: https://github.com/apache/spark/pull/32074#discussion_r611324411



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala
##########
@@ -96,7 +93,7 @@ case class ListJarsCommand(jars: Seq[String] = Seq.empty[String]) extends Runnab
     val jarList = sparkSession.sparkContext.listJars()
     if (jars.nonEmpty) {
       for {
-        jarName <- jars.map(f => new Path(f).getName)

Review comment:
       So, I think this is fine too (?)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815079934


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41604/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816474004






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816359502






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815457683


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41632/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-817693652


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41787/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815451838






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on a change in pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
sarutak commented on a change in pull request #32074:
URL: https://github.com/apache/spark/pull/32074#discussion_r608354674



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
##########
@@ -952,6 +952,72 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd
     }
   }
 
+  test("SPARK-34977: LIST FILES/JARS/ARCHIVES should handle multiple quoted path arguments") {
+    withTempDir { dir =>
+      val file1 = File.createTempFile("someprefix1", "somesuffix1", dir)
+      val file2 = File.createTempFile("someprefix2", "somesuffix2", dir)
+      val file3 = File.createTempFile("someprefix3", "somesuffix 3", dir)
+
+      Files.write(file1.toPath, "file1".getBytes)
+      Files.write(file2.toPath, "file2".getBytes)
+      Files.write(file3.toPath, "file3".getBytes)
+
+      sql(s"ADD FILE ${file1.getAbsolutePath}")
+      sql(s"ADD FILE ${file2.getAbsolutePath}")
+      sql(s"ADD FILE '${file3.getAbsolutePath}'")
+      val listFiles = sql("LIST FILES " +
+        s"'${file1.getAbsolutePath}' ${file2.getAbsolutePath} '${file3.getAbsolutePath}'")
+      assert(listFiles.count === 3)
+      assert(listFiles.filter(_.getString(0).contains(file1.getName)).count() === 1)
+      assert(listFiles.filter(_.getString(0).contains(file2.getName)).count() === 1)
+      assert(listFiles.filter(
+        _.getString(0).contains(file3.getName.replace(" ", "%20"))).count() === 1)
+
+      val file4 = File.createTempFile("someprefix4", "somesuffix4", dir)
+      val file5 = File.createTempFile("someprefix5", "somesuffix5", dir)
+      val file6 = File.createTempFile("someprefix6", "somesuffix6", dir)
+      Files.write(file4.toPath, "file4".getBytes)
+      Files.write(file5.toPath, "file5".getBytes)
+      Files.write(file6.toPath, "file6".getBytes)
+
+      val jarFile1 = new File(dir, "test1.jar")
+      val jarFile2 = new File(dir, "test2.jar")
+      val jarFile3 = new File(dir, "test 3.jar")
+      TestUtils.createJar(Seq(file4), jarFile1)
+      TestUtils.createJar(Seq(file5), jarFile2)
+      TestUtils.createJar(Seq(file6), jarFile3)
+
+      sql(s"ADD ARCHIVE ${jarFile1.getAbsolutePath}")
+      sql(s"ADD ARCHIVE ${jarFile2.getAbsolutePath}#foo")
+      sql(s"ADD ARCHIVE '${jarFile3.getAbsolutePath}'")
+      val listArchives = sql("LIST ARCHIVES " +
+        s"'${jarFile1.getAbsolutePath}' ${jarFile2.getAbsolutePath} '${jarFile3.getAbsolutePath}'")
+      assert(listArchives.count === 3)
+      assert(listArchives.filter(_.getString(0).contains(jarFile1.getName)).count() === 1)
+      assert(listArchives.filter(_.getString(0).contains(jarFile2.getName)).count() === 1)
+      assert(listArchives.filter(
+        _.getString(0).contains(jarFile3.getName.replace(" ", "%20"))).count() === 1)
+
+      val file7 = File.createTempFile("someprefix7", "somesuffix7", dir)
+      val file8 = File.createTempFile("someprefix8", "somesuffix8", dir)
+      Files.write(file4.toPath, "file7".getBytes)
+      Files.write(file5.toPath, "file8".getBytes)
+
+      val jarFile4 = new File(dir, "test4.jar")
+      val jarFile5 = new File(dir, "test5.jar")
+      TestUtils.createJar(Seq(file7), jarFile4)
+      TestUtils.createJar(Seq(file8), jarFile5)
+
+      sql(s"ADD JAR ${jarFile4.getAbsolutePath}")
+      sql(s"ADD JAR ${jarFile5.getAbsolutePath}")

Review comment:
       Unlike `ADD FILE "path"` and `ADD ARCHIVE "PATH"`, we cannot execute `ADD JAR "path"` when the path contains whitespaces.
   I think it's a bug and #32052 will fix this issue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815140114


   #32052 is merged now. Could you rebase this to the master, @sarutak ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816335307


   **[Test build #137110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137110/testReport)** for PR 32074 at commit [`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814701703


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137011/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816641516


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137116/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816438572


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137110/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816474044


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41695/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815267183


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137026/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815437760


   **[Test build #137053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137053/testReport)** for PR 32074 at commit [`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815419786


   **[Test build #137053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137053/testReport)** for PR 32074 at commit [`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
sarutak commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814661415


   retest this please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815079934


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41604/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-817693652


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41787/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #32074:
URL: https://github.com/apache/spark/pull/32074#discussion_r608904983



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
##########
@@ -952,6 +952,72 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd
     }
   }
 
+  test("SPARK-34977: LIST FILES/JARS/ARCHIVES should handle multiple quoted path arguments") {
+    withTempDir { dir =>
+      val file1 = File.createTempFile("someprefix1", "somesuffix1", dir)
+      val file2 = File.createTempFile("someprefix2", "somesuffix2", dir)
+      val file3 = File.createTempFile("someprefix3", "somesuffix 3", dir)
+
+      Files.write(file1.toPath, "file1".getBytes)
+      Files.write(file2.toPath, "file2".getBytes)
+      Files.write(file3.toPath, "file3".getBytes)
+
+      sql(s"ADD FILE ${file1.getAbsolutePath}")
+      sql(s"ADD FILE ${file2.getAbsolutePath}")
+      sql(s"ADD FILE '${file3.getAbsolutePath}'")
+      val listFiles = sql("LIST FILES " +
+        s"'${file1.getAbsolutePath}' ${file2.getAbsolutePath} '${file3.getAbsolutePath}'")
+      assert(listFiles.count === 3)
+      assert(listFiles.filter(_.getString(0).contains(file1.getName)).count() === 1)
+      assert(listFiles.filter(_.getString(0).contains(file2.getName)).count() === 1)
+      assert(listFiles.filter(
+        _.getString(0).contains(file3.getName.replace(" ", "%20"))).count() === 1)
+
+      val file4 = File.createTempFile("someprefix4", "somesuffix4", dir)
+      val file5 = File.createTempFile("someprefix5", "somesuffix5", dir)
+      val file6 = File.createTempFile("someprefix6", "somesuffix6", dir)
+      Files.write(file4.toPath, "file4".getBytes)
+      Files.write(file5.toPath, "file5".getBytes)
+      Files.write(file6.toPath, "file6".getBytes)
+
+      val jarFile1 = new File(dir, "test1.jar")
+      val jarFile2 = new File(dir, "test2.jar")
+      val jarFile3 = new File(dir, "test 3.jar")
+      TestUtils.createJar(Seq(file4), jarFile1)
+      TestUtils.createJar(Seq(file5), jarFile2)
+      TestUtils.createJar(Seq(file6), jarFile3)
+
+      sql(s"ADD ARCHIVE ${jarFile1.getAbsolutePath}")
+      sql(s"ADD ARCHIVE ${jarFile2.getAbsolutePath}#foo")
+      sql(s"ADD ARCHIVE '${jarFile3.getAbsolutePath}'")
+      val listArchives = sql("LIST ARCHIVES " +
+        s"'${jarFile1.getAbsolutePath}' ${jarFile2.getAbsolutePath} '${jarFile3.getAbsolutePath}'")
+      assert(listArchives.count === 3)
+      assert(listArchives.filter(_.getString(0).contains(jarFile1.getName)).count() === 1)
+      assert(listArchives.filter(_.getString(0).contains(jarFile2.getName)).count() === 1)
+      assert(listArchives.filter(
+        _.getString(0).contains(jarFile3.getName.replace(" ", "%20"))).count() === 1)
+
+      val file7 = File.createTempFile("someprefix7", "somesuffix7", dir)
+      val file8 = File.createTempFile("someprefix8", "somesuffix8", dir)
+      Files.write(file4.toPath, "file7".getBytes)
+      Files.write(file5.toPath, "file8".getBytes)
+
+      val jarFile4 = new File(dir, "test4.jar")
+      val jarFile5 = new File(dir, "test5.jar")
+      TestUtils.createJar(Seq(file7), jarFile4)
+      TestUtils.createJar(Seq(file8), jarFile5)
+
+      sql(s"ADD JAR ${jarFile4.getAbsolutePath}")
+      sql(s"ADD JAR ${jarFile5.getAbsolutePath}")

Review comment:
       Got it. @sarutak .




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814630805


   **[Test build #136998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136998/testReport)** for PR 32074 at commit [`c5a6345`](https://github.com/apache/spark/commit/c5a6345be8b5f49466e03a4c563ec7ae3ef5a40e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814662891






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816474044


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41695/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814665240


   **[Test build #137011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137011/testReport)** for PR 32074 at commit [`c5a6345`](https://github.com/apache/spark/commit/c5a6345be8b5f49466e03a4c563ec7ae3ef5a40e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814678411


   **[Test build #137011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137011/testReport)** for PR 32074 at commit [`c5a6345`](https://github.com/apache/spark/commit/c5a6345be8b5f49466e03a4c563ec7ae3ef5a40e).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815451838


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137053/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816371261


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41688/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814788670


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41589/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814662891






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-817849173


   **[Test build #137208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137208/testReport)** for PR 32074 at commit [`3f78c43`](https://github.com/apache/spark/commit/3f78c4387f6acc3c781a39876214376ba86b0e2b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816422426


   **[Test build #137110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137110/testReport)** for PR 32074 at commit [`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
sarutak commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816332378


   retest this please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816335307


   **[Test build #137110 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137110/testReport)** for PR 32074 at commit [`b4cb445`](https://github.com/apache/spark/commit/b4cb445706daad51876ce36d6d80c50d295bac2b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak closed pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
sarutak closed pull request #32074:
URL: https://github.com/apache/spark/pull/32074


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814643086


   **[Test build #136998 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136998/testReport)** for PR 32074 at commit [`c5a6345`](https://github.com/apache/spark/commit/c5a6345be8b5f49466e03a4c563ec7ae3ef5a40e).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815457653






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816439982


   **[Test build #137116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137116/testReport)** for PR 32074 at commit [`7416724`](https://github.com/apache/spark/commit/7416724f01b3e9f2bc27eaac286e01f079f4a37e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-817649909


   **[Test build #137208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137208/testReport)** for PR 32074 at commit [`3f78c43`](https://github.com/apache/spark/commit/3f78c4387f6acc3c781a39876214376ba86b0e2b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-817687539






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814630805


   **[Test build #136998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136998/testReport)** for PR 32074 at commit [`c5a6345`](https://github.com/apache/spark/commit/c5a6345be8b5f49466e03a4c563ec7ae3ef5a40e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815071351


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41604/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32074:
URL: https://github.com/apache/spark/pull/32074#discussion_r611324298



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala
##########
@@ -67,17 +65,16 @@ case class ListFilesCommand(files: Seq[String] = Seq.empty[String]) extends Runn
   }
   override def run(sparkSession: SparkSession): Seq[Row] = {
     val fileList = sparkSession.sparkContext.listFiles()
-    if (files.size > 0) {
-      files.map { f =>
-        val uri = new URI(f)
-        val schemeCorrectedPath = uri.getScheme match {
-          case null | "local" => new File(f).getCanonicalFile.toURI.toString
-          case _ => f
-        }
-        new Path(schemeCorrectedPath).toUri.toString

Review comment:
       @sarutak, I think `listFiles` is always URIs if I am not mistaken. If they are all canonical URIs, I believe we don't have to switch `Path` (?).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814665240


   **[Test build #137011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137011/testReport)** for PR 32074 at commit [`c5a6345`](https://github.com/apache/spark/commit/c5a6345be8b5f49466e03a4c563ec7ae3ef5a40e).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814658765


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41575/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-816620187


   **[Test build #137116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137116/testReport)** for PR 32074 at commit [`7416724`](https://github.com/apache/spark/commit/7416724f01b3e9f2bc27eaac286e01f079f4a37e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class UpdateStarAction(condition: Option[Expression]) extends MergeAction `
     * `case class InsertStarAction(condition: Option[Expression]) extends MergeAction `


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on a change in pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
sarutak commented on a change in pull request #32074:
URL: https://github.com/apache/spark/pull/32074#discussion_r608354674



##########
File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
##########
@@ -952,6 +952,72 @@ class HiveQuerySuite extends HiveComparisonTest with SQLTestUtils with BeforeAnd
     }
   }
 
+  test("SPARK-34977: LIST FILES/JARS/ARCHIVES should handle multiple quoted path arguments") {
+    withTempDir { dir =>
+      val file1 = File.createTempFile("someprefix1", "somesuffix1", dir)
+      val file2 = File.createTempFile("someprefix2", "somesuffix2", dir)
+      val file3 = File.createTempFile("someprefix3", "somesuffix 3", dir)
+
+      Files.write(file1.toPath, "file1".getBytes)
+      Files.write(file2.toPath, "file2".getBytes)
+      Files.write(file3.toPath, "file3".getBytes)
+
+      sql(s"ADD FILE ${file1.getAbsolutePath}")
+      sql(s"ADD FILE ${file2.getAbsolutePath}")
+      sql(s"ADD FILE '${file3.getAbsolutePath}'")
+      val listFiles = sql("LIST FILES " +
+        s"'${file1.getAbsolutePath}' ${file2.getAbsolutePath} '${file3.getAbsolutePath}'")
+      assert(listFiles.count === 3)
+      assert(listFiles.filter(_.getString(0).contains(file1.getName)).count() === 1)
+      assert(listFiles.filter(_.getString(0).contains(file2.getName)).count() === 1)
+      assert(listFiles.filter(
+        _.getString(0).contains(file3.getName.replace(" ", "%20"))).count() === 1)
+
+      val file4 = File.createTempFile("someprefix4", "somesuffix4", dir)
+      val file5 = File.createTempFile("someprefix5", "somesuffix5", dir)
+      val file6 = File.createTempFile("someprefix6", "somesuffix6", dir)
+      Files.write(file4.toPath, "file4".getBytes)
+      Files.write(file5.toPath, "file5".getBytes)
+      Files.write(file6.toPath, "file6".getBytes)
+
+      val jarFile1 = new File(dir, "test1.jar")
+      val jarFile2 = new File(dir, "test2.jar")
+      val jarFile3 = new File(dir, "test 3.jar")
+      TestUtils.createJar(Seq(file4), jarFile1)
+      TestUtils.createJar(Seq(file5), jarFile2)
+      TestUtils.createJar(Seq(file6), jarFile3)
+
+      sql(s"ADD ARCHIVE ${jarFile1.getAbsolutePath}")
+      sql(s"ADD ARCHIVE ${jarFile2.getAbsolutePath}#foo")
+      sql(s"ADD ARCHIVE '${jarFile3.getAbsolutePath}'")
+      val listArchives = sql("LIST ARCHIVES " +
+        s"'${jarFile1.getAbsolutePath}' ${jarFile2.getAbsolutePath} '${jarFile3.getAbsolutePath}'")
+      assert(listArchives.count === 3)
+      assert(listArchives.filter(_.getString(0).contains(jarFile1.getName)).count() === 1)
+      assert(listArchives.filter(_.getString(0).contains(jarFile2.getName)).count() === 1)
+      assert(listArchives.filter(
+        _.getString(0).contains(jarFile3.getName.replace(" ", "%20"))).count() === 1)
+
+      val file7 = File.createTempFile("someprefix7", "somesuffix7", dir)
+      val file8 = File.createTempFile("someprefix8", "somesuffix8", dir)
+      Files.write(file4.toPath, "file7".getBytes)
+      Files.write(file5.toPath, "file8".getBytes)
+
+      val jarFile4 = new File(dir, "test4.jar")
+      val jarFile5 = new File(dir, "test5.jar")
+      TestUtils.createJar(Seq(file7), jarFile4)
+      TestUtils.createJar(Seq(file8), jarFile5)
+
+      sql(s"ADD JAR ${jarFile4.getAbsolutePath}")
+      sql(s"ADD JAR ${jarFile5.getAbsolutePath}")

Review comment:
       Unlike `ADD FILE "path"` and `ADD ARCHIVE "path"`, we cannot execute `ADD JAR "path"` when the path contains whitespaces.
   I think it's a bug and #32052 will fix this issue.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
sarutak commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-817463916


   cc: @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814701703


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137011/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #32074:
URL: https://github.com/apache/spark/pull/32074#discussion_r611321847



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/resources.scala
##########
@@ -67,17 +65,16 @@ case class ListFilesCommand(files: Seq[String] = Seq.empty[String]) extends Runn
   }
   override def run(sparkSession: SparkSession): Seq[Row] = {
     val fileList = sparkSession.sparkContext.listFiles()
-    if (files.size > 0) {
-      files.map { f =>
-        val uri = new URI(f)
-        val schemeCorrectedPath = uri.getScheme match {
-          case null | "local" => new File(f).getCanonicalFile.toURI.toString
-          case _ => f
-        }
-        new Path(schemeCorrectedPath).toUri.toString
-      }.collect {
-        case f if fileList.contains(f) => f
-      }.map(Row(_))

Review comment:
       @sarutak, what's the diff here except `Utils.resolveURI`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-814784246


   Kubernetes integration test unable to build dist.
   
   exiting with code: 1
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41589/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815247138


   **[Test build #137026 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137026/testReport)** for PR 32074 at commit [`752da21`](https://github.com/apache/spark/commit/752da2178e5698281f39e12de8a4be748e13feea).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32074: [SPARK-34977][SQL] LIST FILES/JARS/ARCHIVES cannot handle multiple arguments properly when at least one path is quoted

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32074:
URL: https://github.com/apache/spark/pull/32074#issuecomment-815040801


   **[Test build #137026 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137026/testReport)** for PR 32074 at commit [`752da21`](https://github.com/apache/spark/commit/752da2178e5698281f39e12de8a4be748e13feea).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org