You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2017/11/02 12:36:51 UTC

[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/19643

    [SPARK-11421][CORE][PYTHON][R] Added ability for addJar to augment the current classloader 

    ## What changes were proposed in this pull request?
    
    This PR takes over https://github.com/apache/spark/pull/15666
    
    > Adds a flag to sc.addJar to add the jar to the current classloader
    
    ## How was this patch tested?
    
    Manually tested. 
    
    > Unit tests, manual tests
    > 
    > This is a continuation of the pull request in #9313 and is mostly a rebase of that moved to master > with SparkR additions.
    
    Closes #15666

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-11421

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19643.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19643
    
----
commit ab4b6b15fdaf580193be5f1c2b9befa9e04a207b
Author: Marius van Niekerk <ma...@gmail.com>
Date:   2017-03-18T14:09:26Z

    SPARK-11421 Squashed addjar pr

commit 3c321740f27ebe09dc387005d88cbe6630877e0b
Author: Marius van Niekerk <ma...@gmail.com>
Date:   2017-03-18T14:25:07Z

    Addressed some review comments

commit a62367fd5d22dd8dc9f3a9ee0efd626355e684d7
Author: Marius van Niekerk <ma...@gmail.com>
Date:   2017-03-18T14:30:48Z

    Addressed some review comments

commit 49b9d48ff9001a48892aab5f0179aa039486ca35
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-11-02T11:27:22Z

    Address the rest of comments

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by mariusvniekerk <gi...@git.apache.org>.
Github user mariusvniekerk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r149085525
  
    --- Diff: R/pkg/R/context.R ---
    @@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
       invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
     }
     
    +#' Adds a JAR dependency for Spark tasks to be executed in the future.
    +#'
    +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    +#' If \code{addToCurrentClassLoader} is true, add the jar to the current driver.
    --- End diff --
    
    maybe something like `underlying/backing java process` ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r148710799
  
    --- Diff: R/pkg/R/context.R ---
    @@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
       invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
     }
     
    +#' Adds a JAR dependency for Spark tasks to be executed in the future.
    +#'
    +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    +#' If \code{addToCurrentClassLoader} is true, add the jar to the current driver.
    --- End diff --
    
    hmm, is this right `add the jar to the current driver.`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    **[Test build #83596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83596/testReport)** for PR 19643 at commit [`ab52809`](https://github.com/apache/spark/commit/ab5280944d2c82debed68c4df1c1684bb0db88c6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r149641601
  
    --- Diff: python/pyspark/context.py ---
    @@ -860,6 +860,23 @@ def addPyFile(self, path):
                 import importlib
                 importlib.invalidate_caches()
     
    +    def addJar(self, path, addToCurrentClassLoader=False):
    +        """
    +        Adds a JAR dependency for Spark tasks to be executed in the future.
    +        The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +        filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    +        If `addToCurrentClassLoader` is true, add the jar to the current threads' class loader
    +        in the backing JVM. In general adding to the current threads' class loader will impact all
    +        other application threads unless they have explicitly changed their class loader.
    --- End diff --
    
    @holdenk and @felixcheung, here I just added the comments back. I thought it's a developer API and might be fine to describe some words related with JVM but .. please let me know if you guys feel we need to take out.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83596/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    cc @shivaram, @felixcheung, @mariusvniekerk, @holdenk and @brkyvz who were in the PR. Would you guys mind taking a look please?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83357/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r151585367
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1838,12 +1852,21 @@ class SparkContext(config: SparkConf) extends Logging {
               case _ => path
             }
           }
    +
           if (key != null) {
             val timestamp = System.currentTimeMillis
             if (addedJars.putIfAbsent(key, timestamp).isEmpty) {
               logInfo(s"Added JAR $path at $key with timestamp $timestamp")
               postEnvironmentUpdate()
             }
    +
    +        if (addToCurrentClassLoader) {
    +          Utils.getContextOrSparkClassLoader match {
    +            case cl: MutableURLClassLoader => cl.addURL(Utils.resolveURI(path).toURL)
    --- End diff --
    
    Thanks @jerryshao. Will check through this concern within this weekend and be back.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r149087406
  
    --- Diff: R/pkg/R/context.R ---
    @@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
       invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
     }
     
    +#' Adds a JAR dependency for Spark tasks to be executed in the future.
    +#'
    +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    +#' If \code{addToCurrentClassLoader} is true, add the jar to the current driver.
    --- End diff --
    
    Yup, probably that's better wording. Let me update it after a bit more waiting other review comments. @mariusvniekerk, I am okay with closing it if you happen to have time to proceed yours now, or I can proceed here. Either way works. Up to you :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r149477868
  
    --- Diff: python/pyspark/context.py ---
    @@ -860,6 +860,19 @@ def addPyFile(self, path):
                 import importlib
                 importlib.invalidate_caches()
     
    +    def addJar(self, path, addToCurrentClassLoader=False):
    --- End diff --
    
    We should mention that adding a jar to the current class loader is a developer API and may change.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    **[Test build #83337 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83337/testReport)** for PR 19643 at commit [`49b9d48`](https://github.com/apache/spark/commit/49b9d48ff9001a48892aab5f0179aa039486ca35).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r148983977
  
    --- Diff: R/pkg/R/context.R ---
    @@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
       invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
     }
     
    +#' Adds a JAR dependency for Spark tasks to be executed in the future.
    +#'
    +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    --- End diff --
    
    I think it refers the actual `local:/`:
    
    https://github.com/apache/spark/blob/b2463fad718d25f564d62c50d587610de3d0c5bd/core/src/main/scala/org/apache/spark/SparkContext.scala#L1838
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83339/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Hi @jerryshao. Would you maybe have some time to take a look for this one please?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83340/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    **[Test build #83365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83365/testReport)** for PR 19643 at commit [`b928ab8`](https://github.com/apache/spark/commit/b928ab82c54a72f91c01ba9d3418c80c58ab9da1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    **[Test build #83337 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83337/testReport)** for PR 19643 at commit [`49b9d48`](https://github.com/apache/spark/commit/49b9d48ff9001a48892aab5f0179aa039486ca35).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r148689276
  
    --- Diff: R/pkg/R/context.R ---
    @@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
       invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
     }
     
    +#' Adds a JAR dependency for Spark tasks to be executed in the future.
    +#'
    +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    +#' If \code{addToCurrentClassLoader} is true, add the jar to the current driver.
    +#'
    +#' @rdname spark.addJar
    +#' @param path The path of the jar to be added
    +#' @param addToCurrentClassLoader Whether to add the jar to the current driver class loader.
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' spark.addJar("/path/to/something.jar", TRUE)
    +#'}
    +#' @note spark.addJar since 2.3.0
    +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
    +  normalizedPath <- suppressWarnings(normalizePath(path))
    --- End diff --
    
    I avoided to pass URI here by passing the abs path for now BTW.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83331/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r148520298
  
    --- Diff: R/pkg/R/context.R ---
    @@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
       invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
     }
     
    +#' Adds a JAR dependency for Spark tasks to be executed in the future.
    +#'
    +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    +#' If \code{addToCurrentClassLoader} is true, add the jar to the current driver.
    +#'
    +#' @rdname spark.addJar
    +#' @param path The path of the jar to be added
    +#' @param addToCurrentClassLoader Whether to add the jar to the current driver class loader.
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' spark.addJar("/path/to/something.jar", TRUE)
    +#'}
    +#' @note spark.addJar since 2.3.0
    +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
    +  normalizedPath <- suppressWarnings(normalizePath(path))
    --- End diff --
    
    Looks we have a problem here with handling URI, Windows path, although most of other cases should be fine though:
    
    ```r
    > normalizePath("file:/C:/a/b/c")
    [1] "C:\\Users\\IEUser\\workspace\\spark\\file:\\C:\\a\\b\\c"
    Warning message:
    In normalizePath(path.expand(path), winslash, mustWork) :
      path[1]="file:/C:/a/b/c": The filename, directory name, or volume label syntax
     is incorrect
    ```
    
    This looks ending up with an weird path like `"C:\\Users\\IEUser\\workspace\\spark\\file:\\C:\\a\\b\\c"`.
    
    I am not sure how we should handle this as this pattern ` normalizedPath <- suppressWarnings(normalizePath(path))` looks quite common.
    
    If it is fine, I would like to address this issue separately for other APIs, for example, `spark.addFile` right above ..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83356/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r151307924
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -1838,12 +1852,21 @@ class SparkContext(config: SparkConf) extends Logging {
               case _ => path
             }
           }
    +
           if (key != null) {
             val timestamp = System.currentTimeMillis
             if (addedJars.putIfAbsent(key, timestamp).isEmpty) {
               logInfo(s"Added JAR $path at $key with timestamp $timestamp")
               postEnvironmentUpdate()
             }
    +
    +        if (addToCurrentClassLoader) {
    +          Utils.getContextOrSparkClassLoader match {
    +            case cl: MutableURLClassLoader => cl.addURL(Utils.resolveURI(path).toURL)
    --- End diff --
    
    I'm not sure does it support remote jars on HTTPS or Hadoop FileSystems?In the executor side, we handle this explicitly by downloading jars to local and  add to classpath, but here looks like we don't have such logic. I'm not sure how this `URLClassLoader` communicate with Hadoop or Https without certificates.
    
    The `addJar` is just adding jars to fileserver, so that executor could fetch them from driver and add to classpath. It will not affect driver's classpath. If we support adding jars to current driver's classloader, then how do we leverage this newly added jars?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r148710840
  
    --- Diff: R/pkg/R/context.R ---
    @@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
       invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
     }
     
    +#' Adds a JAR dependency for Spark tasks to be executed in the future.
    +#'
    +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    --- End diff --
    
    is `local:/path` referring to windows drive/path, or the actual text `local:/` should be there?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r151839304
  
    --- Diff: python/pyspark/context.py ---
    @@ -860,6 +860,23 @@ def addPyFile(self, path):
                 import importlib
                 importlib.invalidate_caches()
     
    +    def addJar(self, path, addToCurrentClassLoader=False):
    +        """
    +        Adds a JAR dependency for Spark tasks to be executed in the future.
    +        The `path` passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +        filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    +        If `addToCurrentClassLoader` is true, add the jar to the current threads' class loader
    +        in the backing JVM. In general adding to the current threads' class loader will impact all
    +        other application threads unless they have explicitly changed their class loader.
    --- End diff --
    
    So we currently use `.. note:: DeveloperApi` to indicate it's a developer API (see ml/pipeline and friends for an example).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r148710943
  
    --- Diff: R/pkg/R/context.R ---
    @@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
       invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
     }
     
    +#' Adds a JAR dependency for Spark tasks to be executed in the future.
    +#'
    +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    +#' If \code{addToCurrentClassLoader} is true, add the jar to the current driver.
    +#'
    +#' @rdname spark.addJar
    +#' @param path The path of the jar to be added
    +#' @param addToCurrentClassLoader Whether to add the jar to the current driver class loader.
    +#' @export
    +#' @examples
    +#'\dontrun{
    +#' spark.addJar("/path/to/something.jar", TRUE)
    +#'}
    +#' @note spark.addJar since 2.3.0
    +spark.addJar <- function(path, addToCurrentClassLoader = FALSE) {
    +  normalizedPath <- suppressWarnings(normalizePath(path))
    --- End diff --
    
    yea, normalizePath wouldn't handle url...
    https://stat.ethz.ch/R-manual/R-devel/library/base/html/normalizePath.html
    
    I think we should require absolute paths in their canonical form here and just pass through..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83365/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    **[Test build #83365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83365/testReport)** for PR 19643 at commit [`b928ab8`](https://github.com/apache/spark/commit/b928ab82c54a72f91c01ba9d3418c80c58ab9da1).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Let me leave this closed now and will reopen when I am ready to proceed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon closed the pull request at:

    https://github.com/apache/spark/pull/19643


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Most of codes were by @mariusvniekerk and I think it should credit to him. I did some clean up and addressed the review comments not being addressed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    **[Test build #83596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83596/testReport)** for PR 19643 at commit [`ab52809`](https://github.com/apache/spark/commit/ab5280944d2c82debed68c4df1c1684bb0db88c6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r149087020
  
    --- Diff: R/pkg/R/context.R ---
    @@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
       invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
     }
     
    +#' Adds a JAR dependency for Spark tasks to be executed in the future.
    +#'
    +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    +#' If \code{addToCurrentClassLoader} is true, add the jar to the current driver.
    --- End diff --
    
    Oh, you are back!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r151296077
  
    --- Diff: R/pkg/R/context.R ---
    @@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
       invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
     }
     
    +#' Adds a JAR dependency for Spark tasks to be executed in the future.
    +#'
    +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    +#' If \code{addToCurrentClassLoader} is true, add the jar to the current driver.
    --- End diff --
    
    @mariusvniekerk are you okay with proceeding this here?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83337/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19643#discussion_r148983980
  
    --- Diff: R/pkg/R/context.R ---
    @@ -319,6 +319,27 @@ spark.addFile <- function(path, recursive = FALSE) {
       invisible(callJMethod(sc, "addFile", suppressWarnings(normalizePath(path)), recursive))
     }
     
    +#' Adds a JAR dependency for Spark tasks to be executed in the future.
    +#'
    +#' The \code{path} passed can be either a local file, a file in HDFS (or other Hadoop-supported
    +#' filesystems), an HTTP, HTTPS or FTP URI, or local:/path for a file on every worker node.
    +#' If \code{addToCurrentClassLoader} is true, add the jar to the current driver.
    --- End diff --
    
    I think it is roughly right .. I wanted to avoid the words like "classloader" or "thread" .. Not sure what's the best wording to describe this within R / Python contexts.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19643: [SPARK-11421][CORE][PYTHON][R] Added ability for addJar ...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19643
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org