You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/15 02:06:34 UTC

[GitHub] [spark] scravy opened a new pull request #31565: [SPARK-34438] Check path component in isPython/isR, not full URI

scravy opened a new pull request #31565:
URL: https://github.com/apache/spark/pull/31565


   This fixes https://issues.apache.org/jira/browse/SPARK-34438
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   
   This changes `spark-submit` to accept URIs which have their path component ending in `.py` as py-spark drivers. This is important to work for example with pre-signed URIs from AWS S3.
   
   ### Why are the changes needed?
   
   It is surprising to see that a regular URI does not work as expected, i.e. `http://my-web-server/driver.py` works as expected, but `http://my-web-server/driver.py?some-query` does not (spark will try to download the latter as a JAR).
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, as it makes `spark-submit` accept a wider range of inputs as Python or R drivers (the correct range).
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851113109


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851132964


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851113186


   **[Test build #139090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139090/testReport)** for PR 31565 at commit [`cced137`](https://github.com/apache/spark/commit/cced1372715fd1654cbc40620a30116e28b245db).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851104586


   cc @tgravescs, @steveloughran, @Ngone51 @mridulm @attilapiros FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #31565:
URL: https://github.com/apache/spark/pull/31565


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780279169


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39764/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851148106


   **[Test build #139090 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139090/testReport)** for PR 31565 at commit [`cced137`](https://github.com/apache/spark/commit/cced1372715fd1654cbc40620a30116e28b245db).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-921346355


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780261682


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39764/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851126484


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43611/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780266532


   **[Test build #135185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135185/testReport)** for PR 31565 at commit [`cced137`](https://github.com/apache/spark/commit/cced1372715fd1654cbc40620a30116e28b245db).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31565: [SPARK-34438] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31565:
URL: https://github.com/apache/spark/pull/31565#discussion_r576536530



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1072,14 +1072,29 @@ object SparkSubmit extends CommandLineUtils with Logging {
    * Return whether the given primary resource requires running python.
    */
   private[deploy] def isPython(res: String): Boolean = {
-    res != null && res.endsWith(".py") || res == PYSPARK_SHELL
+    if (res == null) {
+      return false
+    }
+    if (res == PYSPARK_SHELL) {
+      return true
+    }
+    val uri = new java.net.URI(res)

Review comment:
       I think the problem here is that it wouldn't work with local paths that has special characters disallowed in URI. See `PythonRunner` or `RRunner`.
   
   Can we just do something like `Try(new java.net.URI(res).getPath).getOrElse(res).endsWith(".py")`? We will also have to have a test case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851148815


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/139090/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780319879


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135185/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] scravy commented on a change in pull request #31565: [SPARK-34438] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
scravy commented on a change in pull request #31565:
URL: https://github.com/apache/spark/pull/31565#discussion_r576713319



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1072,14 +1072,29 @@ object SparkSubmit extends CommandLineUtils with Logging {
    * Return whether the given primary resource requires running python.
    */
   private[deploy] def isPython(res: String): Boolean = {
-    res != null && res.endsWith(".py") || res == PYSPARK_SHELL
+    if (res == null) {
+      return false
+    }
+    if (res == PYSPARK_SHELL) {
+      return true
+    }
+    val uri = new java.net.URI(res)

Review comment:
       @HyukjinKwon I am handling the possible URI syntax exception now and also extracted the common code into a separate private function with a comment. Let me know whether that is more what you had in mind.
   
   Gonna provide a test case later on.
   
   This issue should be labelled SPARK_SUBMIT I think? Or can I add the label by changing the title? I noticed the square brackets allover the place.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851113533


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780314233


   **[Test build #135185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135185/testReport)** for PR 31565 at commit [`cced137`](https://github.com/apache/spark/commit/cced1372715fd1654cbc40620a30116e28b245db).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] scravy commented on a change in pull request #31565: [SPARK-34438] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
scravy commented on a change in pull request #31565:
URL: https://github.com/apache/spark/pull/31565#discussion_r576564357



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1072,14 +1072,29 @@ object SparkSubmit extends CommandLineUtils with Logging {
    * Return whether the given primary resource requires running python.
    */
   private[deploy] def isPython(res: String): Boolean = {
-    res != null && res.endsWith(".py") || res == PYSPARK_SHELL
+    if (res == null) {
+      return false
+    }
+    if (res == PYSPARK_SHELL) {
+      return true
+    }
+    val uri = new java.net.URI(res)

Review comment:
       @HyukjinKwon Would this be a good place to add a test case? -> https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/deploy/PythonRunnerSuite.scala
   
   Good point about the local paths... I'll take care of that.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851129414


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43611/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851113533






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851113109


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-850733793


   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780246032


   **[Test build #135183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135183/testReport)** for PR 31565 at commit [`e7451f6`](https://github.com/apache/spark/commit/e7451f63bf02bda2ebd7e691fcb7061168d62d55).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780297922






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780279169


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/39764/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851132964






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780266532


   **[Test build #135185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135185/testReport)** for PR 31565 at commit [`cced137`](https://github.com/apache/spark/commit/cced1372715fd1654cbc40620a30116e28b245db).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-778904524


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31565: [SPARK-34438] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31565:
URL: https://github.com/apache/spark/pull/31565#discussion_r576565152



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1072,14 +1072,29 @@ object SparkSubmit extends CommandLineUtils with Logging {
    * Return whether the given primary resource requires running python.
    */
   private[deploy] def isPython(res: String): Boolean = {
-    res != null && res.endsWith(".py") || res == PYSPARK_SHELL
+    if (res == null) {
+      return false
+    }
+    if (res == PYSPARK_SHELL) {
+      return true
+    }
+    val uri = new java.net.URI(res)

Review comment:
       Yes, I think you should add a test in places under `org.apache.spark.deploy` such as `SparkSubmitSuite` and `PythonRunnerSuite` as you pointed out.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780287116


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39766/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780287584


   **[Test build #135183 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135183/testReport)** for PR 31565 at commit [`e7451f6`](https://github.com/apache/spark/commit/e7451f63bf02bda2ebd7e691fcb7061168d62d55).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-778904524


   Can one of the admins verify this patch?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] steveloughran commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
steveloughran commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-857152998


   Interesting use case. + @mukund-thakur @mehakmeet
   
   I'm not sure that the s3a connector handles ? presigning, so I'm not worried there. If https URLs to an s3 bucket are given out with the ? args, then AWS S3 will either take the signature and allow or fail with 40-something.
   
   There's no harm in comparing this way for any of the stores, not that I can see, and it can only help with this specific use case. 
   
   Not that efficient to be pulling in individual files though., not if you have a lot of them


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851113186


   **[Test build #139090 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/139090/testReport)** for PR 31565 at commit [`cced137`](https://github.com/apache/spark/commit/cced1372715fd1654cbc40620a30116e28b245db).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #31565: [SPARK-34438] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #31565:
URL: https://github.com/apache/spark/pull/31565#discussion_r577259333



##########
File path: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
##########
@@ -1068,18 +1065,50 @@ object SparkSubmit extends CommandLineUtils with Logging {
     mainClass == "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2"
   }
 
+  /**
+   * Extracts the path from a URI or returns the local path.
+   *
+   * This helper is used to determine whether the path or URI given as the driver to
+   * spark-submit is a Python or R file. It needs to parse the path as URI to deal
+   * with URIs that have a query fragment (such as AWS S3 presigned URLs).
+   *
+   * This was introduced to fix https://issues.apache.org/jira/browse/SPARK-34438
+   */
+  private def extractPath(localPathOrUri: String): String = {
+    try {
+      val uri = new URI(localPathOrUri)
+      return uri.getPath
+    } catch {
+      case _: URISyntaxException => return localPathOrUri
+    }
+  }
+
   /**
    * Return whether the given primary resource requires running python.
    */
   private[deploy] def isPython(res: String): Boolean = {
-    res != null && res.endsWith(".py") || res == PYSPARK_SHELL

Review comment:
       I would just do a one line fix:
   
   ```diff
   diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
   index 8f1425fbb84..b2846d2e7a8 100644
   --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
   +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
   @@ -1072,14 +1072,16 @@ object SparkSubmit extends CommandLineUtils with Logging {
       * Return whether the given primary resource requires running python.
       */
      private[deploy] def isPython(res: String): Boolean = {
   -    res != null && res.endsWith(".py") || res == PYSPARK_SHELL
   +    // Try to extract the paths from URI to address fragments, see also SPARK-34438.
   +    res != null && Try(new URI(res).getPath).getOrElse(res).endsWith(".py") || res == PYSPARK_SHELL
      }
   
      /**
       * Return whether the given primary resource requires running R.
       */
      private[deploy] def isR(res: String): Boolean = {
   -    res != null && (res.endsWith(".R") || res.endsWith(".r")) || res == SPARKR_SHELL
   +    // Try to extract the paths from URI to address fragments, see also SPARK-34438.
   +    res != null && Try(new URI(res).getPath).getOrElse(res).endsWith(".r") || res == SPARKR_SHELL
      }
   
      private[deploy] def isInternal(res: String): Boolean = {
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780229926


   ok to test


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780281857


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39766/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #31565:
URL: https://github.com/apache/spark/pull/31565


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851172026


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780246032


   **[Test build #135183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135183/testReport)** for PR 31565 at commit [`e7451f6`](https://github.com/apache/spark/commit/e7451f63bf02bda2ebd7e691fcb7061168d62d55).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780319879


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135185/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780267070


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/39764/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-780297921






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31565: [SPARK-34438][SPARK SUBMIT] Check path component in isPython/isR, not full URI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31565:
URL: https://github.com/apache/spark/pull/31565#issuecomment-851126220


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43611/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org