You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by weiqingy <gi...@git.apache.org> on 2017/03/18 19:25:34 UTC

[GitHub] spark pull request #17342: [SPARK-18910][SPARK-12868] Allow adding jars from...

GitHub user weiqingy opened a pull request:

    https://github.com/apache/spark/pull/17342

    [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

    ## What changes were proposed in this pull request?
    Spark 2.2 is going to be cut, it'll be great if SPARK-12868 can be resolved before that. There have been several PRs for this like [PR#16324](https://github.com/apache/spark/pull/16324) , but all of them are inactivity for a long time or have been closed. 
    
    This PR added a SparkUrlStreamHandlerFactory, which relies on 'protocol' to choose the appropriate
    UrlStreamHandlerFactory like FsUrlStreamHandlerFactory to create URLStreamHandler. 
    
    ## How was this patch tested?
    1. Add a new unit test.
    2. Check manually.
    Before: throw an exception with " failed unknown protocol: hdfs"
    <img width="914" alt="screen shot 2017-03-17 at 9 07 36 pm" src="https://cloud.githubusercontent.com/assets/8546874/24075277/5abe0a7c-0bd5-11e7-900e-ec3d3105da0b.png">
    
    After:
    <img width="1148" alt="screen shot 2017-03-18 at 11 42 18 am" src="https://cloud.githubusercontent.com/assets/8546874/24075283/69382a60-0bd5-11e7-8d30-d9405c3aaaba.png">

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/weiqingy/spark SPARK-18910

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17342
    
----
commit 04556c9f2f4feb53e3f644d795a38de4a4e919ca
Author: Weiqing Yang <ya...@gmail.com>
Date:   2017-03-18T18:55:28Z

    [SPARK-18910] Allow adding jars from hdfs

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    CC @vanzin @tgravescs , can you please also review this PR? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by Chopinxb <gi...@git.apache.org>.
Github user Chopinxb commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    @steveloughran sorry for the delay,and very appreciate for creating this issue[SPARK-21697](https://issues.apache.org/jira/browse/SPARK-21697)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76128/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107001274
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    API? no, just fs.*.impl for the standard ones, discovery via META-INF/services and you don't want to go there. Probably better to have a core list of the hadoop redists (including the new 2.8+ adl & oss object stores), and the google cloud URL (gss ? )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #76000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76000/testReport)** for PR 17342 at commit [`48069cc`](https://github.com/apache/spark/commit/48069ccb17785bf4a406459d382b13e70b2e704e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74790/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    @steveloughran Thanks Steve.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75885 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75885/testReport)** for PR 17342 at commit [`be0257b`](https://github.com/apache/spark/commit/be0257b6e527e1fbe4b5f19991f20189f04ba426).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by rajeshcode <gi...@git.apache.org>.
Github user rajeshcode commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Will check , But looks like its related to SPARK-21697


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #74792 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74792/testReport)** for PR 17342 at commit [`04556c9`](https://github.com/apache/spark/commit/04556c9f2f4feb53e3f644d795a38de4a4e919ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r112374078
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -146,6 +149,7 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     }
     
     object SharedState {
    +  URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
    --- End diff --
    
    Good point. I have updated the PR. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r113088010
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -2606,4 +2607,19 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
           case ae: AnalysisException => assert(ae.plan == null && ae.getMessage == ae.getSimpleMessage)
         }
       }
    +
    +  test("SPARK-12868: Allow adding jars from hdfs ") {
    +    val jarFromHdfs = "hdfs://doesnotmatter/test.jar"
    +    val jarFromInvalidFs = "fffs://doesnotmatter/test.jar"
    +
    +    // if 'hdfs' is not supported, MalformedURLException will be thrown
    +    new URL(jarFromHdfs)
    +    var exceptionThrown: Boolean = false
    --- End diff --
    
    Replace this whole block with:
    
    ```
    intercept[MalformedURLException] {
      new URL(jarFromInvalidFs)
    }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Hi, @rxin Could you please review this PR? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107713074
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    @jerryshao I was thinking about using reflection to check whether the API exists and if it exists then we have a whole solution. Maybe it's not worth. I'll just support hdfs for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75061/testReport)** for PR 17342 at commit [`bf0dbf9`](https://github.com/apache/spark/commit/bf0dbf9c53e9b2081c595f0a5026405b0839f513).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107063651
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    I am not sure which file systems `FsUrlStreamHandlerFactory` supports. Maybe for now just put "hdfs" in and we can add more when user actually needs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107069396
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    We can simply call `URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
    ` and everything works happily. The only problem is, `URL.setURLStreamHandlerFactory` can only be called once per JVM (check javadoc of URL class), and if we want to support more stream handlers in the future, we wouldn't be able to call `URL.setURLStreamHandlerFactory` again. It's like you only have one wall plug but you have several laptops and you'll have to use a power strip.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r111870688
  
    --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
    @@ -1021,4 +1021,19 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging {
         secretKeys.foreach { key => assert(redactedConf(key) === Utils.REDACTION_REPLACEMENT_TEXT) }
         assert(redactedConf("spark.regular.property") === "not_a_secret")
       }
    +
    +  test("SparkUrlStreamHandlerFactory") {
    +    URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    +
    +    // if 'hdfs' is not supported, MalformedURLException will be thrown
    +    new URL("hdfs://docs.oracle.com/test.jar")
    --- End diff --
    
    The test also works without network connection.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    @Chopinxb no worries; the hard part is thinking how to fix this. I don't see it being possible to do reliably except through an explicit download. Hadoop 2.8+ has moved off commons-logging so this problem *may* have gone away. However, there are too many dependencies to be confident that will hold


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75887 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75887/testReport)** for PR 17342 at commit [`0d30271`](https://github.com/apache/spark/commit/0d302717b85cdf2d4c35eebcb97795a456fd1bed).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    `org.apache.spark.storage.BlockManagerProactiveReplicationSuite.proactive block replication - 3 replicas - 2 block manager deletions` failed, but it passed locally. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75061/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107215428
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    Say one day we need to support a third-party file system named "shaofs", who provides a ShaoUrlStreamHandlerFactory. We cannot set both FsUrlStreamHandlerFactory and ShaoUrlStreamHandlerFactory to JVM, but we can set SparkUrlStreamHandlerFactory which deliver commands to ShaoUrlStreamHandlerFactory and FsUrlStreamHandlerFactory accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #74790 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74790/testReport)** for PR 17342 at commit [`04556c9`](https://github.com/apache/spark/commit/04556c9f2f4feb53e3f644d795a38de4a4e919ca).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r106977805
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    Yeah, that's a good point. I'll check with Hadoop for all supported file systems, and ideally if we can get them via some API.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107583524
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    IMHO, I think we should not rely on Hadoop 2.8+ feature, Spark's supported version is 2.6, it would be better to have a general solution (avoid depending on specific version of Hadoop).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r113103389
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala ---
    @@ -2606,4 +2607,19 @@ class SQLQuerySuite extends QueryTest with SharedSQLContext {
           case ae: AnalysisException => assert(ae.plan == null && ae.getMessage == ae.getSimpleMessage)
         }
       }
    +
    +  test("SPARK-12868: Allow adding jars from hdfs ") {
    +    val jarFromHdfs = "hdfs://doesnotmatter/test.jar"
    +    val jarFromInvalidFs = "fffs://doesnotmatter/test.jar"
    +
    +    // if 'hdfs' is not supported, MalformedURLException will be thrown
    +    new URL(jarFromHdfs)
    +    var exceptionThrown: Boolean = false
    --- End diff --
    
    Thanks. PR has been updated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r111303973
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    You could proxy to other `URLStreamHandlerFactory` when `FsUrlStreamHandlerFactory#createURLStreamHandler` returns null.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107064456
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    I don't think you have to do compare here, Hadoop itself will find out supported FS through `fs.*.impl` and service loader.
    
    Be noted HDI will use wasb by default, so your assumption here ("hdfs") may potentially break their codes.
    
    This comes to the question below, why we need to wrap the `FsUrlStreamHandlerFactory` here? The only difference is that you add one more check to see if it is hdfs or not. I think it is not necessary and is handled by hadoop already.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107064584
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    Can you explain more? I don't see specific difference about your changes compared to `FsUrlStreamHandlerFactory` regarding `called once per JVM`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    I'm going to recommend you file a SPARK bug on issues.apache.org there & an HDFS linked to it "NPE in BlockReaderFactory log init". It looks like the creation of the LOG for BlockReader is triggering introspection which is triggering the BlockReaderFactory to do something before its fully inited, and then possibly NPE-ing as the LOG field is null.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r111365777
  
    --- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
    @@ -1021,4 +1021,19 @@ class UtilsSuite extends SparkFunSuite with ResetSystemProperties with Logging {
         secretKeys.foreach { key => assert(redactedConf(key) === Utils.REDACTION_REPLACEMENT_TEXT) }
         assert(redactedConf("spark.regular.property") === "not_a_secret")
       }
    +
    +  test("SparkUrlStreamHandlerFactory") {
    +    URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    +
    +    // if 'hdfs' is not supported, MalformedURLException will be thrown
    +    new URL("hdfs://docs.oracle.com/test.jar")
    --- End diff --
    
    you should check to see what happens when you run this test on a machine with no network connection. Everyone hates tests that fail when they rely on DNS working (or, in some cases, DNS not resolving an example.org domain)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by Chopinxb <gi...@git.apache.org>.
Github user Chopinxb commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Have u tried it in yarn-client mode? i add this path in v2.1.1 + Hadoop 2.6.0, when i run "add jar" through SparkSQL CLI , it comes out this error:
    ERROR thriftserver.SparkSQLDriver: Failed in [add jar  hdfs://SunshineNameNode3:8020/lib/clouddata-common-lib/chardet-0.0.1.jar]
    java.lang.ExceptionInInitializerError
    	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
    	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:889)
    	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:947)
    	at java.io.DataInputStream.read(DataInputStream.java:100)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2107)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2076)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2052)
    	at org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1274)
    	at org.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242)
    	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163)
    	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149)
    	at org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:632)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:601)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:278)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:267)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:601)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:591)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:738)
    	at org.apache.spark.sql.hive.HiveSessionState.addJar(HiveSessionState.scala:105)
    	at org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
    	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
    	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
    	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
    	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
    	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
    	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
    	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335)
    	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
    	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.NullPointerException
    	at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746)
    	at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:376)
    	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
    	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:889)
    	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:947)
    	at java.io.DataInputStream.read(DataInputStream.java:100)
    	at java.nio.file.Files.copy(Files.java:2908)
    	at java.nio.file.Files.copy(Files.java:3027)
    	at sun.net.www.protocol.jar.URLJarFile$1.run(URLJarFile.java:220)
    	at sun.net.www.protocol.jar.URLJarFile$1.run(URLJarFile.java:216)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.net.www.protocol.jar.URLJarFile.retrieve(URLJarFile.java:215)
    	at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:71)
    	at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:84)
    	at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
    	at sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89)
    	at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:934)
    	at sun.misc.URLClassPath$JarLoader.access$800(URLClassPath.java:791)
    	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:876)
    	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:869)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:868)
    	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:819)
    	at sun.misc.URLClassPath$3.run(URLClassPath.java:565)
    	at sun.misc.URLClassPath$3.run(URLClassPath.java:555)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:554)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:519)
    	at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:484)
    	at sun.misc.URLClassPath.access$100(URLClassPath.java:65)
    	at sun.misc.URLClassPath$1.next(URLClassPath.java:266)
    	at sun.misc.URLClassPath$1.hasMoreElements(URLClassPath.java:277)
    	at java.net.URLClassLoader$3$1.run(URLClassLoader.java:601)
    	at java.net.URLClassLoader$3$1.run(URLClassLoader.java:599)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader$3.next(URLClassLoader.java:598)
    	at java.net.URLClassLoader$3.hasMoreElements(URLClassLoader.java:623)
    	at sun.misc.CompoundEnumeration.next(CompoundEnumeration.java:45)
    	at sun.misc.CompoundEnumeration.hasMoreElements(CompoundEnumeration.java:54)
    	at org.apache.commons.logging.LogFactory.getConfigurationFile(LogFactory.java:1409)
    	at org.apache.commons.logging.LogFactory.getFactory(LogFactory.java:455)
    	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
    	at org.apache.hadoop.hdfs.BlockReaderFactory.<clinit>(BlockReaderFactory.java:77)
    	... 58 more
    java.lang.ExceptionInInitializerError
    	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
    	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:889)
    	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:947)
    	at java.io.DataInputStream.read(DataInputStream.java:100)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
    	at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:369)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:341)
    	at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:292)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2107)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2076)
    	at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2052)
    	at org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1274)
    	at org.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242)
    	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163)
    	at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149)
    	at org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:632)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:601)
    	at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:278)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:267)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:601)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:591)
    	at org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:738)
    	at org.apache.spark.sql.hive.HiveSessionState.addJar(HiveSessionState.scala:105)
    	at org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
    	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
    	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
    	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
    	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
    	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
    	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:185)
    	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
    	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
    	at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:699)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:62)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335)
    	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247)
    	at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:498)
    	at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:743)
    	at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    	at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.NullPointerException
    	at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:746)
    	at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:376)
    	at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:662)
    	at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:889)
    	at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:947)
    	at java.io.DataInputStream.read(DataInputStream.java:100)
    	at java.nio.file.Files.copy(Files.java:2908)
    	at java.nio.file.Files.copy(Files.java:3027)
    	at sun.net.www.protocol.jar.URLJarFile$1.run(URLJarFile.java:220)
    	at sun.net.www.protocol.jar.URLJarFile$1.run(URLJarFile.java:216)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.net.www.protocol.jar.URLJarFile.retrieve(URLJarFile.java:215)
    	at sun.net.www.protocol.jar.URLJarFile.getJarFile(URLJarFile.java:71)
    	at sun.net.www.protocol.jar.JarFileFactory.get(JarFileFactory.java:84)
    	at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:122)
    	at sun.net.www.protocol.jar.JarURLConnection.getJarFile(JarURLConnection.java:89)
    	at sun.misc.URLClassPath$JarLoader.getJarFile(URLClassPath.java:934)
    	at sun.misc.URLClassPath$JarLoader.access$800(URLClassPath.java:791)
    	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:876)
    	at sun.misc.URLClassPath$JarLoader$1.run(URLClassPath.java:869)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath$JarLoader.ensureOpen(URLClassPath.java:868)
    	at sun.misc.URLClassPath$JarLoader.<init>(URLClassPath.java:819)
    	at sun.misc.URLClassPath$3.run(URLClassPath.java:565)
    	at sun.misc.URLClassPath$3.run(URLClassPath.java:555)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:554)
    	at sun.misc.URLClassPath.getLoader(URLClassPath.java:519)
    	at sun.misc.URLClassPath.getNextLoader(URLClassPath.java:484)
    	at sun.misc.URLClassPath.access$100(URLClassPath.java:65)
    	at sun.misc.URLClassPath$1.next(URLClassPath.java:266)
    	at sun.misc.URLClassPath$1.hasMoreElements(URLClassPath.java:277)
    	at java.net.URLClassLoader$3$1.run(URLClassLoader.java:601)
    	at java.net.URLClassLoader$3$1.run(URLClassLoader.java:599)
    	at java.security.AccessController.doPrivileged(Native Method)
    	at java.net.URLClassLoader$3.next(URLClassLoader.java:598)
    	at java.net.URLClassLoader$3.hasMoreElements(URLClassLoader.java:623)
    	at sun.misc.CompoundEnumeration.next(CompoundEnumeration.java:45)
    	at sun.misc.CompoundEnumeration.hasMoreElements(CompoundEnumeration.java:54)
    	at org.apache.commons.logging.LogFactory.getConfigurationFile(LogFactory.java:1409)
    	at org.apache.commons.logging.LogFactory.getFactory(LogFactory.java:455)
    	at org.apache.commons.logging.LogFactory.getLog(LogFactory.java:657)
    	at org.apache.hadoop.hdfs.BlockReaderFactory.<clinit>(BlockReaderFactory.java:77)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r108775285
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    > Say one day we need to support a third-party file system named "shaofs"
    
    So why not add the abstraction on that day? If `FsUrlStreamHandlerFactory` does what is needed here, and correctly support more than just the `hdfs` protocol (which isn't even enough to cover HDFS itself, which can be federated), then I don't see what the abstraction is buying us today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75887 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75887/testReport)** for PR 17342 at commit [`0d30271`](https://github.com/apache/spark/commit/0d302717b85cdf2d4c35eebcb97795a456fd1bed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107381697
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    FWIW, we've been backing out of using service discovery for the filesystem clients built into hadoop (e.g. HADOOP-14138). Why? hurting startup times, especially once we'd switched to the fully-shaded-own-jackson version of the AWS SDK. From Hadoop 2.8+, at least for now, you get the list of internal ones on a scan of config options. But we reserve the right to change it in future.
    
    I'd be amenable to having an API call in which FileSystem lists all URL schemas which the JVM knows about. That doesn't mean that they will load, only that it knows the implementation classname.
    
    I've also considered having a config option which lists all schemas it knows are object stores, a simple comma separated list, where we could include things like google gss:,, even though it's not bundled. Why? Lets apps downstream such as spark, hive and flink see if a filesystem is an object store, without having to add a whole new API. And if  its in the list, expect different behaviours, like expensive renames. That, being just an overridable config option, is inexpensive to add



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75886 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75886/testReport)** for PR 17342 at commit [`7c2d61a`](https://github.com/apache/spark/commit/7c2d61a81d492aa1b259071e9c9af5a01320dbfb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #76000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76000/testReport)** for PR 17342 at commit [`48069cc`](https://github.com/apache/spark/commit/48069ccb17785bf4a406459d382b13e70b2e704e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r112019602
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    Thanks for the comments. I have updated the PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75905 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75905/testReport)** for PR 17342 at commit [`0d30271`](https://github.com/apache/spark/commit/0d302717b85cdf2d4c35eebcb97795a456fd1bed).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107326694
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    I see, get your point. Can you please also address other left issues?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-18910][SPARK-12868] Allow adding jars from...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r106842764
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    +      if (hdfsHandler == null) {
    +        hdfsHandler = new FsUrlStreamHandlerFactory().createURLStreamHandler(protocol)
    --- End diff --
    
    I think you should call `public FsUrlStreamHandlerFactory(Configuration conf) ` this constructor, and use `Configuration` created by `SparkHadoopUtils`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r106978348
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    +      if (hdfsHandler == null) {
    +        hdfsHandler = new FsUrlStreamHandlerFactory().createURLStreamHandler(protocol)
    --- End diff --
    
    Thanks, I'll follow your suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Well, no obvious answer there I'm afraid, except "don't put HDFS JARs on the classpath"; if you serve them up via HTTP all should work


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by rajeshcode <gi...@git.apache.org>.
Github user rajeshcode commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Sorry about above stack. The actual error is  as below. Thats why  relates to SPARK-21697 
    Removing commons-logging jar fixes it. copying back  common-logging jar never complains  again that doesn't  make sense..
    
    18/08/20 11:47:29 ERROR SparkSQLDriver: Failed in [select check('abc')]
    java.lang.ExceptionInInitializerError
            at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656)
            at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882)
            at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
    ....
    ..
            at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.NullPointerException
            at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:685)
            at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #74790 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74790/testReport)** for PR 17342 at commit [`04556c9`](https://github.com/apache/spark/commit/04556c9f2f4feb53e3f644d795a38de4a4e919ca).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r106977152
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    `URL#setURLStreamHandlerFactory` can only be called once per JVM. If we use `FsUrlStreamHandlerFactory` directly, we won't be able to support other factories. I wrapped it for future extendability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Nobody else seems to have comments, so I'll merge to master / 2.2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    At a guess, there's possibly a mix here between hadoop hdfs JARs on your classpath. You sure everything on the classpath is in sync? What JARs with hadoop-hdfs are there?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r113061767
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -146,6 +149,7 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     }
     
     object SharedState {
    +  URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
    --- End diff --
    
    Hi, @vanzin Could you please help to review this PR again? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    The failures, I think, were not triggered by this code change. Will re-trigger Jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75973/testReport)** for PR 17342 at commit [`be16d1a`](https://github.com/apache/spark/commit/be16d1a23d30ad1a031aed4a15e6a7ee3dd51d45).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75905/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r112029478
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -146,6 +149,7 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     }
     
     object SharedState {
    +  URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
    --- End diff --
    
    I'm wondering if it's better to add a try..catch around this:
    
    ```
    scala> URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
    
    scala> URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory())
    java.lang.Error: factory already defined
      at java.net.URL.setURLStreamHandlerFactory(URL.java:1112)
      ... 48 elided
    ```
    
    Normally this wouldn't matter, but if someone is messing with class loaders (e.g. running Spark embedded in a web app in a servlet container), they may run into situations where this code might run twice, or may even fail in the first time (if the user's application also installs a stream handler).
    
    So I think it's safer to catch the error and print a warning message here. But really optimal would be if the "add jar" code didn't use URL at all for this. That's for a future change though.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76000/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75061/testReport)** for PR 17342 at commit [`bf0dbf9`](https://github.com/apache/spark/commit/bf0dbf9c53e9b2081c595f0a5026405b0839f513).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r111300718
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    Hi, @vanzin @gatorsmile Could you please give some direction on which way to go? We need to support hdfs anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r110511571
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    In a [prior PR](https://github.com/apache/spark/pull/16324), FsUrlStreamHandlerFactory is set to JVM URL class directly. @gatorsmile raised a concern that `URL.setURLStreamHandlerFactory` can be called only once per JVM, and that is the motivation of this PR. Either one is OK for me; however we've got to choose one. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75886/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #74792 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74792/testReport)** for PR 17342 at commit [`04556c9`](https://github.com/apache/spark/commit/04556c9f2f4feb53e3f644d795a38de4a4e919ca).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107075363
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    But with your wrap `SparkUrlStreamHandlerFactory`, why it can be called more than once? I didn't get the point here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74792/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Thanks, @vanzin .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-18910][SPARK-12868] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Can you update the title to add `[SQL]` module? It looks like a sql specific problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #75905 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75905/testReport)** for PR 17342 at commit [`0d30271`](https://github.com/apache/spark/commit/0d302717b85cdf2d4c35eebcb97795a456fd1bed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Jenkins, test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-18910][SPARK-12868] Allow adding jars from...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r106841997
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    IMHO I think directly register `FsUrlStreamHandlerFactory` with `URL#setURLStreamHandlerFactory` should be enough, it is not necessary to wrap again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #76128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76128/testReport)** for PR 17342 at commit [`fb1ee81`](https://github.com/apache/spark/commit/fb1ee811e12f05c5d31880e6d88f306148612c18).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Created: [SPARK-21697](https://issues.apache.org/jira/browse/SPARK-21697) with the stack trace attached


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by rajeshcode <gi...@git.apache.org>.
Github user rajeshcode commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Patching this to spark-2.1.0 , had several issues ..
    java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.BlockReaderFactory
            at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
            at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
            at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
            at java.io.DataInputStream.read(DataInputStream.java:100)
            at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
            at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
            at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
            at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)
            at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:338)
            at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)
            at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:2030)
            at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1999)
            at org.apache.hadoop.fs.FileSystem.copyToLocalFile(FileSystem.java:1975)
            at org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1274)
            at org.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242)
            at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163)
            at org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149)
            at org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
            at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:633)
            at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:602)
            at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:287...
    ....
            at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hdfs.BlockReaderFactory
            at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
            at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
            at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
            at java.io.DataInputStream.read(DataInputStream.java:100)
            at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:85)
            at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:59)
            at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:119)
            at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:366)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    **[Test build #76128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76128/testReport)** for PR 17342 at commit [`fb1ee81`](https://github.com/apache/spark/commit/fb1ee811e12f05c5d31880e6d88f306148612c18).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107331216
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    Yes, I will update the PR soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-18910][SPARK-12868] Allow adding jars from...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r106841824
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    Here looks like you only support HDFS, actually `FsUrlStreamingHandlerFactory` could support different Hadoop compatible system system, should we also support others, like `wasb`, `webhdfs`?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r110517523
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    Sorry, missed this. There's nothing explicit in 2.8+ right now; don't hold your breath. If people do want to co-dev one, be happy to help. There's no point me implementing something which isn't useful/going to be used by downstream projects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r111303746
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    @weiqingy , even if you add a wrapper to try to support different stream handler factory. It is not a good idea to filter with only hdfs. `FsUrlStreamHandlerFactory` by default supports different fs implementation, it is not necessary to check deliberately by the upstream code. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by steveloughran <gi...@git.apache.org>.
Github user steveloughran commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Hmmm.[SPARK-21697](https://issues.apache.org/jira/browse/SPARK-21697) has a lot of the CP, but the problem in that one is some recursive loading of artifacts off HDFS, the can for commons-logging.properties being the troublespot. 
    
    @rajeshcode , what you have seems more that a classic "class not found" problem, where one class is loading, but a dependency isn't being found. And as HDFS has moved its stuff around in a split from one  hadoop-hdfs JAR into split client and server, that may be the cause.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r111437314
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SharedState.scala ---
    @@ -148,6 +149,8 @@ private[sql] class SharedState(val sparkContext: SparkContext) extends Logging {
     
     object SharedState {
     
    +  URL.setURLStreamHandlerFactory(new SparkUrlStreamHandlerFactory())
    --- End diff --
    
    What @jerryshao said.
    
    But I also don't see the need to create any abstraction until it's necessary. So really there's no point in implementing it at this point. If you want to use the hypothetical argument of supporting a new FS, I'll give you the argument that such FS would be implemented as a `FileSystem` and it would automatically hook up to `FsUrlStreamHandlerFactory`, so no need to modify Spark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75887/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17342


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75885/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by weiqingy <gi...@git.apache.org>.
Github user weiqingy commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17342#discussion_r107456555
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2767,3 +2767,24 @@ private[spark] class CircularBuffer(sizeInBytes: Int = 10240) extends java.io.Ou
         new String(nonCircularBuffer, StandardCharsets.UTF_8)
       }
     }
    +
    +
    +/**
    + * Factory for URL stream handlers. It relies on 'protocol' to choose the appropriate
    + * UrlStreamHandlerFactory to create URLStreamHandler. Adding new 'if' branches in
    + * 'createURLStreamHandler' like 'hdfsHandler' to support more protocols.
    + */
    +private[spark] class SparkUrlStreamHandlerFactory extends URLStreamHandlerFactory {
    +  private var hdfsHandler : URLStreamHandler = _
    +
    +  def createURLStreamHandler(protocol: String): URLStreamHandler = {
    +    if (protocol.compareToIgnoreCase("hdfs") == 0) {
    --- End diff --
    
    Thanks @steveloughran. Shall I hold up this PR and wait for the API or config option to be ready? Are they on your schedule? Or shall I just finish this PR first and then make some change when the new API is ready? I don't know if the customer can wait yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17342: [SPARK-12868][SQL] Allow adding jars from hdfs

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17342
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75973/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org