You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by weiqingy <gi...@git.apache.org> on 2017/03/18 19:25:34 UTC

[GitHub] spark pull request #17342: [SPARK-18910][SPARK-12868] Allow adding jars from...

GitHub user weiqingy opened a pull request:

    https://github.com/apache/spark/pull/17342

    [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

    ## What changes were proposed in this pull request?
    Spark 2.2 is going to be cut, it'll be great if SPARK-12868 can be resolved before that. There have been several PRs for this like [PR#16324](https://github.com/apache/spark/pull/16324) , but all of them are inactivity for a long time or have been closed. 
    
    This PR added a SparkUrlStreamHandlerFactory, which relies on 'protocol' to choose the appropriate
    UrlStreamHandlerFactory like FsUrlStreamHandlerFactory to create URLStreamHandler. 
    
    ## How was this patch tested?
    1. Add a new unit test.
    2. Check manually.
    Before: throw an exception with " failed unknown protocol: hdfs"
    <img width="914" alt="screen shot 2017-03-17 at 9 07 36 pm" src="https://cloud.githubusercontent.com/assets/8546874/24075277/5abe0a7c-0bd5-11e7-900e-ec3d3105da0b.png">
    
    After:
    <img width="1148" alt="screen shot 2017-03-18 at 11 42 18 am" src="https://cloud.githubusercontent.com/assets/8546874/24075283/69382a60-0bd5-11e7-8d30-d9405c3aaaba.png">

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/weiqingy/spark SPARK-18910

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17342
    
----
commit 04556c9f2f4feb53e3f644d795a38de4a4e919ca
Author: Weiqing Yang <ya...@gmail.com>
Date:   2017-03-18T18:55:28Z

    [SPARK-18910] Allow adding jars from hdfs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.

Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    CC @vanzin @tgravescs , can you please also review this PR? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by Chopinxb <gi...@git.apache.org>.

Github user Chopinxb commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    @steveloughran sorry for the delay，and very appreciate for creating this issue[SPARK-21697](https://issues.apache.org/jira/browse/SPARK-21697)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76128/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.

Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107001274
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    API? no, just fs.*.impl for the standard ones, discovery via META-INF/services and you don't want to go there. Probably better to have a core list of the hadoop redists (including the new 2.8+ adl & oss object stores), and the google cloud URL (gss ? )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #76000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76000/testReport)** for PR 17342 at commit [`48069cc`](https://github.com/apache/spark/commit/48069ccb17785bf4a406459d382b13e70b2e704e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74790/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    @steveloughran Thanks Steve.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75885/testReport)** for PR 17342 at commit [`be0257b`](https://github.com/apache/spark/commit/be0257b6e527e1fbe4b5f19991f20189f04ba426).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by rajeshcode <gi...@git.apache.org>.

Github user rajeshcode commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Will check , But looks like its related to SPARK-21697


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #74792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74792/testReport)** for PR 17342 at commit [`04556c9`](https://github.com/apache/spark/commit/04556c9f2f4feb53e3f644d795a38de4a4e919ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r112374078
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -146,6 +149,7 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     }
     
     object SharedState {
    +  URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
    --- End diff --
    
    Good point. I have updated the PR. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r113088010
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -2606,4 +2607,19 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
           case ae: AnalysisException => assert(ae.plan == null && ae.getMessage == ae.getSimpleMessage)
         }
       }
    +
    +  test("SPARK-12868: Allow adding jars from hdfs ") {
    +    val jarFromHdfs = "hdfs://doesnotmatter/test.jar"
    +    val jarFromInvalidFs = "fffs://doesnotmatter/test.jar"
    +
    +    // if 'hdfs' is not supported, MalformedURLException will be thrown
    +    new URL(jarFromHdfs)
    +    var exceptionThrown: Boolean = false
    --- End diff --
    
    Replace this whole block with:
    
    ```
    intercept[MalformedURLException] {
      new URL(jarFromInvalidFs)
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Hi, @rxin Could you please review this PR? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107713074
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    @jerryshao I was thinking about using reflection to check whether the API exists and if it exists then we have a whole solution. Maybe it's not worth. I'll just support hdfs for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75061/testReport)** for PR 17342 at commit [`bf0dbf9`](https://github.com/apache/spark/commit/bf0dbf9c53e9b2081c595f0a5026405b0839f513).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107063651
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    I am not sure which file systems `FsUrlStreamHandlerFactory` supports. Maybe for now just put "hdfs" in and we can add more when user actually needs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107069396
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    We can simply call `URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
    ` and everything works happily. The only problem is, `URL.setURLStreamHandlerFactory` can only be called once per JVM (check javadoc of URL class), and if we want to support more stream handlers in the future, we wouldn't be able to call `URL.setURLStreamHandlerFactory` again. It's like you only have one wall plug but you have several laptops and you'll have to use a power strip.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r111870688
  
    --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
    @@ -1021,4 +1021,19 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging {
         secretKeys.foreach { key => assert(redactedConf(key) === Utils.REDACTION_REPLACEMENT_TEXT) }
         assert(redactedConf("spark.regular.property") === "not_a_secret")
       }
    +
    +  test("SparkUrlStreamHandlerFactory") {
    +    URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    +
    +    // if 'hdfs' is not supported, MalformedURLException will be thrown
    +    new URL("hdfs://docs.oracle.com/test.jar")
    --- End diff --
    
    The test also works without network connection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.

Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    @Chopinxb no worries; the hard part is thinking how to fix this. I don't see it being possible to do reliably except through an explicit download. Hadoop 2.8+ has moved off commons-logging so this problem *may* have gone away. However, there are too many dependencies to be confident that will hold


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75887/testReport)** for PR 17342 at commit [`0d30271`](https://github.com/apache/spark/commit/0d302717b85cdf2d4c35eebcb97795a456fd1bed).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    `org.apache.spark.storage.BlockManagerProactiveReplicationSuite.proactive block replication - 3 replicas - 2 block manager deletions` failed, but it passed locally. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75061/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107215428
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    Say one day we need to support a third-party file system named "shaofs", who provides a ShaoUrlStreamHandlerFactory. We cannot set both FsUrlStreamHandlerFactory and ShaoUrlStreamHandlerFactory to JVM, but we can set SparkUrlStreamHandlerFactory which deliver commands to ShaoUrlStreamHandlerFactory and FsUrlStreamHandlerFactory accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #74790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74790/testReport)** for PR 17342 at commit [`04556c9`](https://github.com/apache/spark/commit/04556c9f2f4feb53e3f644d795a38de4a4e919ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r106977805
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    Yeah, that's a good point. I'll check with Hadoop for all supported file systems, and ideally if we can get them via some API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.

Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107583524
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    IMHO, I think we should not rely on Hadoop 2.8+ feature, Spark's supported version is 2.6, it would be better to have a general solution (avoid depending on specific version of Hadoop).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r113103389
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -2606,4 +2607,19 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
           case ae: AnalysisException => assert(ae.plan == null && ae.getMessage == ae.getSimpleMessage)
         }
       }
    +
    +  test("SPARK-12868: Allow adding jars from hdfs ") {
    +    val jarFromHdfs = "hdfs://doesnotmatter/test.jar"
    +    val jarFromInvalidFs = "fffs://doesnotmatter/test.jar"
    +
    +    // if 'hdfs' is not supported, MalformedURLException will be thrown
    +    new URL(jarFromHdfs)
    +    var exceptionThrown: Boolean = false
    --- End diff --
    
    Thanks. PR has been updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.

Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r111303973
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    You could proxy to other `URLStreamHandlerFactory` when `FsUrlStreamHandlerFactory#createURLStreamHandler` returns null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.

Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107064456
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    I don't think you have to do compare here, Hadoop itself will find out supported FS through `fs.*.impl` and service loader.
    
    Be noted HDI will use wasb by default, so your assumption here ("hdfs") may potentially break their codes.
    
    This comes to the question below, why we need to wrap the `FsUrlStreamHandlerFactory` here? The only difference is that you add one more check to see if it is hdfs or not. I think it is not necessary and is handled by hadoop already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.

Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107064584
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    Can you explain more? I don't see specific difference about your changes compared to `FsUrlStreamHandlerFactory` regarding `called once per JVM`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.

Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    I'm going to recommend you file a SPARK bug on issues.apache.org there & an HDFS linked to it "NPE in BlockReaderFactory log init". It looks like the creation of the LOG for BlockReader is triggering introspection which is triggering the BlockReaderFactory to do something before its fully inited, and then possibly NPE-ing as the LOG field is null.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.

Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r111365777
  
    --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
    @@ -1021,4 +1021,19 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging {
         secretKeys.foreach { key => assert(redactedConf(key) === Utils.REDACTION_REPLACEMENT_TEXT) }
         assert(redactedConf("spark.regular.property") === "not_a_secret")
       }
    +
    +  test("SparkUrlStreamHandlerFactory") {
    +    URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    +
    +    // if 'hdfs' is not supported, MalformedURLException will be thrown
    +    new URL("hdfs://docs.oracle.com/test.jar")
    --- End diff --
    
    you should check to see what happens when you run this test on a machine with no network connection. Everyone hates tests that fail when they rely on DNS working (or, in some cases, DNS not resolving an example.org domain)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by Chopinxb <gi...@git.apache.org>.

Github user Chopinxb commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Have u tried it in yarn-client mode? i add this path in v2.1.1 + Hadoop 2.6.0, when i run "add jar" through SparkSQL CLI , it comes out this error:
    ERROR thriftserver.SparkSQLDriver: Failed in [add jar  hdfs://SunshineNameNode3:8020/lib/clouddata-common-lib/chardet-0.0.1.jar]
    java.lang.ExceptionInInitializerError
    	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
    	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:889)
    	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:947)
    	at java.io.DataInputStream.read(DataInputStream.java:100)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2107)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2076)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2052)
    	at org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1274)
    	at org.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242)
    	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163)
    	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149)
    	at org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:632)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:601)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:278)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:267)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:601)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:591)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:738)
    	at org.apache.spark.sql.hive.HiveSessionState.addJar(HiveSessionState.scala:105)
    	at org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
    	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
    	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
    	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
    	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
    	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
    	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
    	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335)
    	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
    	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.NullPointerException
    	at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746)
    	at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:376)
    	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
    	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:889)
    	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:947)
    	at java.io.DataInputStream.read(DataInputStream.java:100)
    	at java.nio.file.Files.copy(Files.java:2908)
    	at java.nio.file.Files.copy(Files.java:3027)
    	at sun.net.www.protocol.jar.URLJarFile$1.run(URLJarFile.java:220)
    	at sun.net.www.protocol.jar.URLJarFile$1.run(URLJarFile.java:216)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.net.www.protocol.jar.URLJarFile.retrieve(URLJarFile.java:215)
    	at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:71)
    	at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:84)
    	at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
    	at sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89)
    	at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:934)
    	at sun.misc.URLClassPath$JarLoader.access$800(URLClassPath.java:791)
    	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:876)
    	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:869)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:868)
    	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:819)
    	at sun.misc.URLClassPath$3.run(URLClassPath.java:565)
    	at sun.misc.URLClassPath$3.run(URLClassPath.java:555)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:554)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:519)
    	at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:484)
    	at sun.misc.URLClassPath.access$100(URLClassPath.java:65)
    	at sun.misc.URLClassPath$1.next(URLClassPath.java:266)
    	at sun.misc.URLClassPath$1.hasMoreElements(URLClassPath.java:277)
    	at java.net.URLClassLoader$3$1.run(URLClassLoader.java:601)
    	at java.net.URLClassLoader$3$1.run(URLClassLoader.java:599)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader$3.next(URLClassLoader.java:598)
    	at java.net.URLClassLoader$3.hasMoreElements(URLClassLoader.java:623)
    	at sun.misc.CompoundEnumeration.next(CompoundEnumeration.java:45)
    	at sun.misc.CompoundEnumeration.hasMoreElements(CompoundEnumeration.java:54)
    	at org.apache.commons.logging.LogFactory.getConfigurationFile(LogFactory.java:1409)
    	at org.apache.commons.logging.LogFactory.getFactory(LogFactory.java:455)
    	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
    	at org.apache.hadoop.hdfs.BlockReaderFactory.<clinit>(BlockReaderFactory.java:77)
    	... 58 more
    java.lang.ExceptionInInitializerError
    	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
    	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:889)
    	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:947)
    	at java.io.DataInputStream.read(DataInputStream.java:100)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2107)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2076)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2052)
    	at org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1274)
    	at org.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242)
    	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163)
    	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149)
    	at org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:632)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:601)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:278)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:267)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:601)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:591)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:738)
    	at org.apache.spark.sql.hive.HiveSessionState.addJar(HiveSessionState.scala:105)
    	at org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
    	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
    	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
    	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
    	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
    	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
    	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
    	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335)
    	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
    	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.NullPointerException
    	at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746)
    	at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:376)
    	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
    	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:889)
    	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:947)
    	at java.io.DataInputStream.read(DataInputStream.java:100)
    	at java.nio.file.Files.copy(Files.java:2908)
    	at java.nio.file.Files.copy(Files.java:3027)
    	at sun.net.www.protocol.jar.URLJarFile$1.run(URLJarFile.java:220)
    	at sun.net.www.protocol.jar.URLJarFile$1.run(URLJarFile.java:216)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.net.www.protocol.jar.URLJarFile.retrieve(URLJarFile.java:215)
    	at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:71)
    	at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:84)
    	at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
    	at sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89)
    	at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:934)
    	at sun.misc.URLClassPath$JarLoader.access$800(URLClassPath.java:791)
    	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:876)
    	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:869)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:868)
    	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:819)
    	at sun.misc.URLClassPath$3.run(URLClassPath.java:565)
    	at sun.misc.URLClassPath$3.run(URLClassPath.java:555)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:554)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:519)
    	at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:484)
    	at sun.misc.URLClassPath.access$100(URLClassPath.java:65)
    	at sun.misc.URLClassPath$1.next(URLClassPath.java:266)
    	at sun.misc.URLClassPath$1.hasMoreElements(URLClassPath.java:277)
    	at java.net.URLClassLoader$3$1.run(URLClassLoader.java:601)
    	at java.net.URLClassLoader$3$1.run(URLClassLoader.java:599)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader$3.next(URLClassLoader.java:598)
    	at java.net.URLClassLoader$3.hasMoreElements(URLClassLoader.java:623)
    	at sun.misc.CompoundEnumeration.next(CompoundEnumeration.java:45)
    	at sun.misc.CompoundEnumeration.hasMoreElements(CompoundEnumeration.java:54)
    	at org.apache.commons.logging.LogFactory.getConfigurationFile(LogFactory.java:1409)
    	at org.apache.commons.logging.LogFactory.getFactory(LogFactory.java:455)
    	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
    	at org.apache.hadoop.hdfs.BlockReaderFactory.<clinit>(BlockReaderFactory.java:77)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by vanzin <gi...@git.apache.org>.

Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r108775285
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    > Say one day we need to support a third-party file system named "shaofs"
    
    So why not add the abstraction on that day? If `FsUrlStreamHandlerFactory` does what is needed here, and correctly support more than just the `hdfs` protocol (which isn't even enough to cover HDFS itself, which can be federated), then I don't see what the abstraction is buying us today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75887/testReport)** for PR 17342 at commit [`0d30271`](https://github.com/apache/spark/commit/0d302717b85cdf2d4c35eebcb97795a456fd1bed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.

Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107381697
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    FWIW, we've been backing out of using service discovery for the filesystem clients built into hadoop (e.g. HADOOP-14138). Why? hurting startup times, especially once we'd switched to the fully-shaded-own-jackson version of the AWS SDK. From Hadoop 2.8+, at least for now, you get the list of internal ones on a scan of config options. But we reserve the right to change it in future.
    
    I'd be amenable to having an API call in which FileSystem lists all URL schemas which the JVM knows about. That doesn't mean that they will load, only that it knows the implementation classname.
    
    I've also considered having a config option which lists all schemas it knows are object stores, a simple comma separated list, where we could include things like google gss:,, even though it's not bundled. Why? Lets apps downstream such as spark, hive and flink see if a filesystem is an object store, without having to add a whole new API. And if  its in the list, expect different behaviours, like expensive renames. That, being just an overridable config option, is inexpensive to add



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75886/testReport)** for PR 17342 at commit [`7c2d61a`](https://github.com/apache/spark/commit/7c2d61a81d492aa1b259071e9c9af5a01320dbfb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #76000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76000/testReport)** for PR 17342 at commit [`48069cc`](https://github.com/apache/spark/commit/48069ccb17785bf4a406459d382b13e70b2e704e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.

Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r112019602
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    Thanks for the comments. I have updated the PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75905 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75905/testReport)** for PR 17342 at commit [`0d30271`](https://github.com/apache/spark/commit/0d302717b85cdf2d4c35eebcb97795a456fd1bed).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.

Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107326694
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    I see, get your point. Can you please also address other left issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org