You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by pwendell <gi...@git.apache.org> on 2014/02/21 06:31:20 UTC

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

GitHub user pwendell opened a pull request:

    https://github.com/apache/incubator-spark/pull/625

    SPARK-1111: URL Validation Throws Error for HDFS URL's

    Fixes an error where HDFS URL's cause an exception. Should be merged into master and 0.9.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/pwendell/incubator-spark url-validation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spark/pull/625.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #625
    
----
commit fa25ee2b02aa5b4518e76938cce71aad7239ba31
Author: Patrick Wendell <pw...@gmail.com>
Date:   2014-02-21T05:29:19Z

    SPARK-1111: URL Validation Throws Error for HDFS URL's

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35706836
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35710134
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/625#discussion_r9956570
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/ClientArguments.scala ---
    @@ -115,3 +110,7 @@ private[spark] class ClientArguments(args: Array[String]) {
         System.exit(exitCode)
       }
     }
    +
    +object ClientArguments {
    +  def isValidJarUrl(s: String) = s.matches("(.+):(.+)jar")
    --- End diff --
    
    Hey @hsaputra - I'd think we should recommend "//" to users since it's the widely used way. Unfortunately URL naming is totally messed up and inconsistent not only in Hadoop but in the broader internet ecosystem. The recommended philosphy in the IETF is to be "liberal in what you accept, and conservative in what you send". I think we should be liberal in accepting weird URL's, but we should not advise users to make quirky URL's.
    
    http://en.wikipedia.org/wiki/Robustness_principle
    
    That said, this PR is mainly because java.net.URL (incorrectly) considers `hdfs` an invalid scheme. So we can't rely on java.net.URL to validate this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by hsaputra <gi...@git.apache.org>.
Github user hsaputra commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/625#discussion_r9955979
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/ClientArguments.scala ---
    @@ -115,3 +110,7 @@ private[spark] class ClientArguments(args: Array[String]) {
         System.exit(exitCode)
       }
     }
    +
    +object ClientArguments {
    +  def isValidJarUrl(s: String) = s.matches("(.+):(.+)jar")
    --- End diff --
    
    Ah, I did not know Hadoop is ok with file:/ pattern, I always use file://
    
    Thanks for the response, @pwendell
    
    But we do want URL instead of URI for the JAR path location I suppose.
    So this PR is just to "relax" java.net.URL requirement to have the double slashes?
    If that is the case, then maybe we could update the comment to include the ones without "//"
    
    e.g. hdfs://XX.jar, file:/XX.jar, file:XX.jar
    
    Sorry about the questions for this small PR, just wanted to know more about the reasoning.
    Always nervous whenever we add custom URL/ URI validation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35706454
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35756614
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by hsaputra <gi...@git.apache.org>.
Github user hsaputra commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/625#discussion_r9939149
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/ClientSuite.scala ---
    @@ -0,0 +1,17 @@
    +package org.apache.spark.deploy
    --- End diff --
    
    Missing license ASF header?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35706835
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35700516
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-spark/pull/625


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/625#discussion_r9955905
  
    --- Diff: core/src/test/scala/org/apache/spark/deploy/ClientSuite.scala ---
    @@ -0,0 +1,17 @@
    +package org.apache.spark.deploy
    --- End diff --
    
    @hsaputra good catch. Fixed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35756615
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35762279
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35701663
  
    Merged build finished.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35704996
  
    LGTM save the minor regex nit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35705998
  
    @aarondav fixed the nit, waiting for tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by pwendell <gi...@git.apache.org>.
Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/625#discussion_r9953789
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/ClientArguments.scala ---
    @@ -115,3 +110,7 @@ private[spark] class ClientArguments(args: Array[String]) {
         System.exit(exitCode)
       }
     }
    +
    +object ClientArguments {
    +  def isValidJarUrl(s: String) = s.matches("(.+):(.+)jar")
    --- End diff --
    
    @hsaputra - actually the `//` is not a require part of a URI, it is a scheme-dependent part that schemes chose to handle differently. For instance I think that Hadoop is alright with `file:/path/to/my/file` or even `file:path/to/my/file`.
    
    This particular piece of code is just providing a quick sanity check. If the user enters an invalid URL they will get a specific error message downstream...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35700515
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35710135
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12797/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by hsaputra <gi...@git.apache.org>.
Github user hsaputra commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/625#discussion_r9956945
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/ClientArguments.scala ---
    @@ -115,3 +110,7 @@ private[spark] class ClientArguments(args: Array[String]) {
         System.exit(exitCode)
       }
     }
    +
    +object ClientArguments {
    +  def isValidJarUrl(s: String) = s.matches("(.+):(.+)jar")
    --- End diff --
    
    Ah yes
    
    scala> val test = new java.net.URL("hdfs://test.jar")
    java.net.MalformedURLException: unknown protocol: hdfs
    	at java.net.URL.<init>(URL.java:574)
    
    I thought java.net.URL accepts custom scheme =(
    Agree about messed up naming usage. 
    
    +1 
    
    Thanks for the explanation! =)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by hsaputra <gi...@git.apache.org>.
Github user hsaputra commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/625#discussion_r9939595
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/ClientArguments.scala ---
    @@ -115,3 +110,7 @@ private[spark] class ClientArguments(args: Array[String]) {
         System.exit(exitCode)
       }
     }
    +
    +object ClientArguments {
    +  def isValidJarUrl(s: String) = s.matches("(.+):(.+)jar")
    --- End diff --
    
    Should we also check for "//" in the String to prevent something like "hdfs:test.jar"?
    Something like: s.matches("(.+)://([^/]+)jar")
    
    Tried with REPL:
    val s = "hdfs://test.jar"
      s: java.lang.String = hdfs://test.jar
    s.matches("(.+)://([^/]+)jar")
      res3: Boolean = true
    s.matches("(.+):(.+)jar")
      res4: Boolean = true
    val s2 = "hdfs:test.jar"
      s2: java.lang.String = hdfs:test.jar
    s2.matches("(.+)://(.+)jar")
      res5: Boolean = false
    s2.matches("(.+):(.+)jar")
      res6: Boolean = true
    
    We do not want dfs:test.jar to be true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35701664
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12793/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on a diff in the pull request:

    https://github.com/apache/incubator-spark/pull/625#discussion_r9938103
  
    --- Diff: core/src/main/scala/org/apache/spark/deploy/ClientArguments.scala ---
    @@ -115,3 +110,7 @@ private[spark] class ClientArguments(args: Array[String]) {
         System.exit(exitCode)
       }
     }
    +
    +object ClientArguments {
    +  def isValidJarUrl(s: String) = s.matches("^(.+):(.+)jar")
    --- End diff --
    
    "^" is technically not needed since matches is full-string anyway


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by aarondav <gi...@git.apache.org>.
Github user aarondav commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35762793
  
    Thanks! Merged in master and branch-0.9.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---

[GitHub] incubator-spark pull request: SPARK-1111: URL Validation Throws Er...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/incubator-spark/pull/625#issuecomment-35762280
  
    All automated tests passed.
    Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/12803/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. To do so, please top-post your response.
If your project does not have this feature enabled and wishes so, or if the
feature is enabled but not working, please contact infrastructure at
infrastructure@apache.org or file a JIRA ticket with INFRA.
---